5,388 Matching Annotations
  1. Jan 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the Editor and the Reviewers for their time and effort in thoroughly reviewing our manuscript and providing valuable feedback. We hope we have addressed their comments effectively and improved the clarity of our manuscript as a result.

      The major revisions in the updated manuscript are as follows:

      (1) Immunization experiments using mRNA in Syrian hamsters were performed (Supplementary figures 2A, B and C).

      (2) An ELISPOT assay to evaluate cellular immunity in Syrian hamsters inoculated with BK2102 was conducted (Figure 2F).

      (3) IgA titers in BK2102-inoculated Syrian hamsters were successfully measured (Supplementary figure 2B).

      (4) New immunogenicity data for BK2102 in monkeys was additionally included (Supplementary figure 3B).

      (5) The discussion section has been thoroughly revised to integrate the new data.

      These results have been incorporated into the manuscript, and additional text has been added accordingly.

      Below, we provide point-by-point responses to the reviewers’ comments and concerns.

      Public Reviews:

      Reviewer #1:

      (1) A comparative safety assessment of the available m-RNA and live attenuated vaccines will be necessary. The comparison should include details of the doses, neutralizing antibody titers with duration of protection, tissue damage in the various organs, and other risks, including virulence reversal.

      We agree with the Reviewer’s comment regarding the lack of data to compare BK2102 with an mRNA vaccine. Unfortunately, we were unable to obtain commercially available mRNA vaccines for research purposes and could not produce mRNA vaccines of equivalent quality. As a result, a direct comparison of the safety profiles of BK2102 and mRNA vaccines was not possible. To address this, we conducted a GLP study with an additional twelve monkeys to evaluate the safety of BK2102. Following three intranasal inoculations of BK2102 at two-week intervals, no toxic effects were observed in any of the parameters assessed, including tissue damage, respiratory rate, functional observational battery (FOB), hematology, or fever. These results are detailed in lines 115-117.

      Furthermore, we compared the immunogenicity of BK2102 with that of an in-house prepared mRNA vaccine. The mRNA vaccine was designed to target the spike protein of SARS-CoV-2, and its immunogenicity was evaluated in hamsters. When serum neutralizing antibody titers were found to be comparable between the two, intranasal inoculation of BK2102 induced higher IgA levels in nasal wash samples compared to those from hamsters injected intramuscularly with the self-made mRNA vaccine (Supplementary figures. 2A and B, respectively). Additionally, while the mRNA vaccine induced Th1 and Th2 immune responses, as indicated by the detection of IgG1 and IgG2/3 (Supplementary figure. 2C), BK2102 mainly induced a Th1 response in hamsters. These explanatory sentences have been added to the manuscript (lines 140-150).

      (2) The vaccine's effect on primates is doubtful. The study fails to explain why only two of four monkeys developed neutralizing antibodies. Information about the vaccine's testing in monkeys is also missing: What was the level of protection and duration of the persistence of neutralizing antibodies in monkeys? Were the tissue damages and other risks assessed?

      We believe that the reason neutralizing antibody titers were observed in only 2 out of 4 monkeys in the immunogenicity study reported in the original manuscript is that only a single-dose was administered. We measured the neutralizing antibody titers in sera collected from monkeys used in the GLP study and confirmed the induction of neutralizing antibody in all 6 monkeys that received three inoculations of BK2102. This data has been included in a new figure (Supplementary figure 3B). While we would have liked to evaluate the persistence of immunity and conduct a protection study in monkeys, limitations related to facility availability and cost prevented us from doing so. As noted in (1), tissue injury and other risk assessments were evaluated in the GLP study, which showed no evidence of tissue injury or other toxic effects. These results are described in lines 113-117.

      (3) The vaccine's safety in immunosuppressed individuals or individuals with chronic diseases should be assessed. Authors should make specific comments on this aspect.

      In general, live-attenuated vaccines are contraindicated for immunosuppressed individuals or those with chronic conditions, and therefore BK2102 is also not intended for use in these patients.

      This information has been added to the Discussion section (lines 309-311).

      (4) The candidate vaccine has been tested with a limited number of SARS-CoV-2 strains. Of note, the latest Omicron variants have lesser virulence than many early variants, such as the alfa, beta, and delta strains.

      We have added the results of a protection study against the SARS-CoV-2 gamma strain to Supplementary figures 5A and B. No weight loss was observed in BK2102-inoculated hamsters following infection with the gamma strain. These results are described in lines 109-111, 158-162.

      (5) Limitations of the study have not been discussed.

      We apologize for the ambiguity in the description of the Limitations of this paper. One major limitation of this study is that, despite observing high immunogenicity in hamsters, it remains uncertain whether the same positive results would be achieved in humans. Differences in susceptibility exist between species, which are not solely attributed to weight differences. For instance, while a single dose of 10<sup>3</sup> PFU of BK2102 was sufficient to induce neutralizing antibodies in hamsters, a higher dose of 10<sup>7</sup> PFU in monkeys was required to induce antibodies in only about 50% of the monkeys. Additionally, two more challenges in development of BK2102 were added to the discussion. The first was the limited availability of analytical reagents for hamster models, which restricted the detailed immunological characterization of the response. Second, it took time to gather preclinical data due to the space-related restrictions of BSL3 facilities, which delayed the clinical trials for BK2102 until many individuals had already acquired immunity against SARS-CoV-2. It remains to be seen whether our candidate will be optimal for human use, as the immunogenicity of live-attenuated vaccines is generally influenced by pre-existing immunity.

      We added these considerations to the discussion section (lines 300-309).

      Reviewer #2:

      No major weaknesses were identified, however, this reviewer notes the following:

      The authors missed the opportunity to include a mRNA vaccine to demonstrate that the immunity and protection efficacy of their live attenuated vaccine BK2102 is better than a mRNA vaccine.

      One of the potential advantages of live-attenuated vaccines is their ability to induce mucosal

      immunity. It would be great if the authors included experiments to assess the mucosal immunity of their live-attenuated vaccine BK2102.

      We agree with the Reviewer’s suggestion regarding the importance of comparing BK2102 with the mRNA vaccine modality and evaluating the mucosal immunity induced by BK2102. In hamsters, under conditions where serum neutralizing antibody titers were equivalent, intranasal inoculation of BK2102 induced higher levels of antigen-specific IgA in nasal wash compared to intramuscular injection of the conventional mRNA vaccine. This new data has been added in Supplementary figures 2A and B, and corresponding sentences have been included in the Results and Discussion sections (lines 140-145, 292-299).

      Reviewer #3:

      Lack of a more detailed discussion of this new vaccine approach in the context of reported live-attenuated SARS-CoV-2 vaccines in terms of its advantages and/or weaknesses.

      sCPD9 and CoviLiv<sup>TM</sup>, two previously reported live-attenuated vaccines, achieve attenuation through codon deoptimization or a combination of codon deoptimization and FCS deletion. These two strategies affect viral proliferation but do not directly impact virulence. In contrast, the temperature sensitivity-related substitutions in NSP14 included in BK2102 selectively restrict the infection site, reducing the likelihood of lung infection and providing a safety advantage over the other live-attenuated vaccines. As mentioned in the response to comment (5) of Reviewer #1, a limitation of BK2102 is that its development began later than that of the previously reported live-attenuated vaccines. Consequently, we must consider the impact of pre-existing immunity in future human trials. Based on these points, we have added sentences discussing the advantages and disadvantages to the Discussion section (lines 302-305, 312-319).

      Antibody endpoint titers could be presented.

      Thank you for your suggestion. We calculated the antibody endpoint titers for Figure 2A and included the results in lines 105-107 of the revised manuscript.

      Lack of elaboration on immune mechanisms of protection at the upper respiratory tract (URT) against an immune evasive variant in the absence of detectable neutralizing antibodies.

      We appreciate the comment. The potential role of cellular and mucosal immunity in protection has been discussed in more detail in the revised manuscript, specifically in lines 283-295. According to the reference we initially cited, Hasanpourghadi et al. evaluated their adenovirus vector vaccine candidates and reported that the protection was enhanced by co-expression of the nucleocapsid protein rather than relying solely on the spike protein (Hasanpourghadi et al., Microbes Infect, 2023). Therefore, cellular immunity against the nucleocapsid and/or other viral proteins induced by BK2102 may also contribute to protection, as evidenced by more pronounced cellular immunity to the nucleocapsid detected through ELISPOT assay. Moreover, antigen-specific mucosal immunity was successfully detected in additional studies. The involvement of mucosal immunity in protection against mutant strains has been documented in the previously cited reference (Thwaites et al., Nat Commun, 2023). We have included these new data in Figure 2F and Supplementary figure 2B. Additionally, the results and discussion regarding the mechanisms of protection in the upper respiratory tract, in the absence of detectable neutralizing antibodies, have been incorporated into the revised lines 136-139, 143-145 and 283-295, respectively.

      Recommendations for the authors:

      Reviewer #2:

      Figure 1: Please include the LOD and statistical analysis in both panels. Please consider passaging the virus in Vero cell s, approved for human vaccine production, to assess the stability of BK2102 after serial passage in vitro, which is important for its implementation as a live-attenuated vaccine. The authors should consider evaluating viral replication in different cell lines, and also assessing the plaque phenotype.

      Thank you for your valuable comments. First, we have added the statistical analysis and the limit of detection (LOD) to Figure 1. In response to the comments regarding the stability of BK2102 after serial passage in Vero cells, as well as its replication and plaque phenotype in different cell lines, we manufactured test substances for GLP studies and clinical trials by passaging BK2102 in Vero cells, which are approved for human vaccine production. We confirmed that BK2102 is stable (data not shown). Additionally, we verified that BK2102 replicates in BHK, Vero E6, and Vero E6/TMPRSS2 cells, in addition to Vero cells. Among these options, we selected Vero cells due to their high proliferative capacity and ability to produce clear plaques.

      Figure 2: Please, include statistical analysis in panels A, B, and D. Please, include the LOD in panels A and D. Please, include viral titers from these experiments in hamsters and NHPs.

      First, we would like to note that Figure 2D has been replaced by Figure 2C in the revised manuscript, and the data on neutralizing antibody titers in non-human primates (NHPs), originally presented as Figure 2C, have been moved to the Supplementary figure 3A.

      We have added the statistical analysis to Figure 2B and C, as well as the LOD to Figure 2C. Figure 2A (Spike-specific IgG ELISA) was intended for qualitative evaluation based on OD values, so the LOD was not defined. We have also added a detailed description of virus titer in the Methods section under the headings “Evaluation of Immunogenicity in Hamsters” and “Evaluation of Immunogenicity in Monkeys”, and updated the information in the Figure legends of the revised manuscript (lines 451, 459, 468-474, 566-567, 576-578, 582-584, 661-662).

      Figure 3: Please, include the viral titers of the challenge virus in the NT and lungs.

      We have added the virus titers for the challenge experiments to the Results section under the heading “BK2102 induced protective immunity against SARS-CoV-2 infection” (lines 168-174).

      Figure 4: Please, include statistical analysis in panels B and C and evaluate viral titers.

      We have added the statistical analysis to Figure 4B and C. Unfortunately, all samples in Figure 4 were fixed in formalin for histopathological examination, so virus titers could not be measured. However, in past experiments, we measured viral titers in the nasal wash samples and lungs of hamsters three days post-infection with D614G and BK2102. We confirmed that infectious virus was detected in both the nasal wash and lungs of the hamsters infected with D614G strain (2.9 log10 PFU/mL and 5.3 log10 PFU/g, respectively), but not in the lungs of the hamsters with BK2102. The viral titers in the nasal wash of BK2102-infected hamsters were equivalent to those of the hamsters infected with the D614G wild-type strain (3.0 log10 PFU/mL). However, we did not include this data to the revised manuscript.

      Figure 5: Please, include viral titers in different tissues with the different vaccines (panels A and B). Please, include the body weight changes.  Finally, please, consider the possibility of challenging the vaccinated mice with the same SARS-CoV-2 strains used in the manuscript to demonstrate similar protection efficacy in this new ACE2 transgenic mice.

      The different tissues of Tg mice were not sampled, as no gross abnormalities were observed in organs other than lungs and brains during necropsy. We have added new data on the body weight of Tg mice after infection to Supplementary figures 9B and 9C in the revised manuscript, along with additional lines in the Results section (lines 228-230 and 247-248). Although we do not know the reason, we have observed that immunization of this animal model does not lead to an increase in antibody titers. Therefore, we do not consider this animal model suitable for the protection study as you suggested. However, it could be useful in passive immunization experiments.

      Supplementary Figure 1: Since most of the manuscript focuses on BK2102, the authors should consider removing the other live-attenuated vaccines (Supplementary Figure 1A).

      We agree with the Reviewer’s suggestion and have simplified the description for Supplementary Figure 1A (lines 93-97).

      Supplementary Figure 3: Please, include statistical analysis.

      In the revised manuscript, Supplementary Figure 3 from the original manuscript has been moved to Supplementary Figure 2D. The IgG subclass ELISA was intended for a qualitative evaluation based on OD values, and therefore the results were included in the Supplementary figure. However, we realized the description was not clear, so we added further clarification in the Results section (lines 145-147).

      Supplementary Figure 4: Please, include the viral titers in both infected and contact hamsters from this experiment.

      In the revised manuscript, Supplementary Figure 4 in the original manuscript has been moved to Supplementary Figure 6. Unfortunately, due to limited breeding space for the hamsters, we were unable to prepare groups for the evaluation of viral titer, and instead prioritized evaluation by body weight.

      Reviewer #3:

      (1) It would be helpful to discuss this new vaccine in the context of other reported live-attenuated vaccines in terms of its advantages and/or disadvantages.

      Please refer to our response to the Reviewer’s “first comment” above, as well as to the response in Public comment (5) of Reviewer #1. The modifications made in the manuscript are described in lines 302-305 and 312-319.

      (2) Figure 2A: end-point titers could be presented, other than OD values.

      This comment is addressed in the reviewer’s second public comment. The endpoint titer has been included in lines 105-107 of the revised manuscript.

      (3) Figure 2C: it is unclear why only 2 out of 4 NHPs show neutralization titers. This could be moved to a supplementary figure.

      As suggested by the Reviewer, Figure 2C of the original manuscript has been moved to Supplementary Figure 3A in the revised manuscript. In response Public comment (2) from Reviewer #1, we have also added new data on neutralizing antibodies in the monkeys as Supplementary figure 3B.

      (4) Figures 2E-F: bulk measurement of cytokine production in supernatants is not an optimal way to measure vaccine-induced Ag-specific T cells. ELISPOT or ICS are better. T-cell ELSIPOT for hamsters is available. This should at least be discussed.

      Please refer to our response to this Reviewer’s third public comment. We have added the new results in Figure 2F of the revised manuscript.

      (5) It is quite interesting that no N-specific cellular response was observed, given that it is a live-attenuated vaccine. What about N-specific binding Abs?

      We conducted the ELISPOT assay as suggested by the Reviewer and detected cellular immunity against both spike and nucleocapsid proteins (Figure 2F). We did not examine nucleocapsid-specific antibodies, as they do not contribute to the neutralizing activity; however, nucleocapsid-specific cellular immunity was confirmed.

      (6) Figure 3: limit of detection for virological assays could be labeled.

      We have added the LOD in Figures 3C, D, F and G.

      (7) Figures 3E-F: it is interesting to see that the vaccine elicits almost complete protection at URT against BA.5, despite no BA.5 neutralizing titers being detected at all. What mechanism of URT protection by BK2102 would the authors speculate? T cells or other Ab effector functions?

      Please refer to the response to this Reviewer’s third public comment. We have added new results regarding cellular and mucosal immunity (Figure 2F and Supplementary figure 2B) and discussed the mechanisms of protection in the upper respiratory tract in the absence of detectable neutralizing antibodies (lines 136-139, 143-145 and 283-295, respectively).

      (8) Figure 3I: the durability of protection is a strength of the study. Other than body weight changes, what about viral loads in the animals after the challenge?

      We primarily assessed the effect of the vaccine by monitoring changes in body weight, as the differences compared to the naïve group were clear. Unfortunately, we did not collect samples at different time points throughout the study, which prevented us from evaluating the viral titers.

      In addition, we made corrections to several other sections identified during the revision process. The revised parts are as follows:

      - In the Methods section under the title “Evaluation of BK2102 pathogenicity in hamsters”, the infectious virus titer of D614G strain has been corrected (line 478).

      - In the Methods section under the title “In vivo passage of BK2102 in hamsters”, infectious virus titer of BK2102 and A50-18 strain has been corrected (line 487).

      - The collection time of splenocytes after inoculation has been corrected in the figure legend of Figure 2D, (line 583).

      - There was an error in Figure 2D. The figure has been replaced with the appropriate version.

      - A new reference on NSP1 deletion (Ueno et al., Virology, 2024) has been added to the references.

      - Several methods have been described more clearly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Comments on the revised version:

      Concerns flagged about using CRISPR -guide RNA mediated knockdown of viral has yet to be addressed entirely. I understand that the authors could not get knock out despite attempts and hence they have guide RNA mediated knockdown strategy. However, I wondered if the authors looked at the levels of the downstream genes in this knockdown.

      We thank the reviewer for bringing this up since it is known that certain artifacts derived from this approach may be related with changes in expression of downstream genes. We run a qPCR of Rv0432 and Rv0433 and confirmed that no significant differences in expression of virR downstream genes were detected in the virR mutant or the complemented strains relative to WT. This is now indicated in the method section on Generation of the CRISPR mutants. The data is now presented as Supplementary Figure 13.

      Authors have used the virmut-Comp strain for some of the experiments. However, the materials and methods must describe how this strain was generated. Given the mutant is a CRISPR-guide RNA mediated knockdown. The CRISPR construct may have taken up the L5 loci. Did authors use episomal construct for complementation? If so, what is the expression level of virR in the complementation construct? What are the expression levels of downstream genes in mutant and complementation strains? This is important because the transcriptome analysis was redone by considering complementation strain. The complemented strain is written as virmut-C or virmut-Comp. This has to be consistent.

      We apologize for not having included the information about the generation of the complemented strain in our last version of the manuscript. We took the complementing vector from a previous paper on VirR (Rath et al., (2013) PNAS 110(49):E4790). This vector was constructed as follows: Complementation plasmids were cloned using Gateway® Cloning Technology (Invitrogen). E. coli strains expressing the following Gateway vectors were kindly provided by Dirk Schnappinger and Sabine Ehrt: pDO221A, pDO23A, pEN23A-linker1, pEN41A-TO2, pEN21A-Hsp60, pDE43-MEH. PCR was used to amplify the following target sequences from H37Rvgenomic DNA: coding sequence of Rv0431, coding sequence of Rv0431 with a FLAG tag either in its C-terminus or its N-terminus, and the predicted cytosolic sequence of Rv0431 with a FLAG tag in its new C-terminus. The primers used for PCR were designed such that the amplicons would be flanked with Gateway® cloning- specific attachment (att) sites. These PCR products were recombined into Gateway® donor vectors using bacteriophage-derived integrase and integration host factor, resulting in entry vectors. The recombination events are specific to the attB sites on the PCR products and to the attP sequences on the donor vectors, such that the orientation of the target sequence is maintained during the recombination reaction, also known as the BP reaction, for attB-attP recombination. Using the MultiSite Gateway® system, three DNA fragments, derived from each of three distinct entry vectors, can be simultaneously inserted into a final complementation vector called the destination vector in a specific order and orientation. Multisite recombination events are mediated by Integrase and Integrase Host Factor, in a process called the LR reaction (for the attL and attR sites in the entry and destination vectors). The Gateway® entry vectors thus generated were recombined with another entry vector containing either the Hsp60 promoter, an empty entry vector, and a complementation vector (episomal) to give rise to the final destination vector. The destination vector (episomal) was engineered to contain a hygromycin resistance cassette. These vectors were used to transform competent Rv0431-deficient Mtb. The transformation mixture was plated on 7H11 plates containing OADC and hygromycin (50 μg/ml). Colonies, typically observed 3-5 weeks later, were isolated and grown in 7H9 media and characterized.

      For simplicity, we have just referenced our previous paper to indicate that the complementing plasmid is the same used in that study.

      Regarding the virR expression levels in the WT, virR<sup>mut</sup> and complemented virR strains please see previous Figure 6 C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have revised the manuscript in light of previous reviews. The authors have addressed some of my concerns appropriately. However, the specific dataset remains unchanged and unclear.

      Fig 8G and H: In response to a comment on the mechanism of how VirR mediates EV release, the authors have added new data showing an increase in the abundance of deacetylated muropeptides in the mutant. This observation is linked to altered lysozyme activity or PG fragility. In my opinion, this is another indirect observation. More concerning is the complemented strain, which also showed a comparable increase in deacetylated muropeptides, indicating that the altered muropeptides could be unrelated to VirR.

      We must disagree here with the reviewer assessment about the fact that the abundance of deacetylated muropeptides is an indirect indication of PG fragility. We consider that this observation and quantitative fact is another additional evidence that indicate a more fragile PG. We believe that considering each of the supporting facts individually may be seen as indirect, but we would like that the reviewer take all the evidence together: (i) sensitivity to lysozyme; (ii) enlargement and altered physicochemical morphological characteristics including porosity or thickness; (iii) altered penetrance of FDAAs; and (iv) increased released of muropeptides. In this later fact, the complemented strain may not display the WT features, but this may be due to some artifacts derived from the complementation.

      Taking all together, we believe that the PG of virR<sup>mut</sup> is more fragile than that of the WT and the complemented strains based on a series of evidence. We hope the reviewer may consider this perspective when analyzing such a complex feature like PG fragility. So far, there is not a direct method to assess this condition.

      Lipid analyses are not comprehensive. The issue related to the need for more clarity of DIMA and DIMB still needs to be addressed. I understand that the authors do not have facilities to perform radioactive assays. However, they could have repeated the experiment to generate a better-quality image. Similarly, the newly generated SL-1, PAT, and DAT TLC could be of better quality. Bands still need to be resolved. The solvent front is irregular. The same is true for PIMs and DPG TLCs. With the evidence provided, the deregulation of cell wall lipids is incomplete.

      We agree with the reviewer that the quality of the TLC is not appropriate. We have no repeated the PDIM TLC (new Fig 7D). In addition, we have repeated the TLCs resolving sulfolipids in a 2D mode. For simplicity we just run the glycerol condition including the three strains. This is now part of a new Supplementary figure 8 B. For PIMs, we have a 1D and a 2D analysis that, after checking previous papers using similar approaches with no radioactivity, we consider that it has the desired quality to identify the indicated lipids.

      We hope this new data and repeated experiments satisfy the reviewer concerns.

      Thank you very much for your assessment and time to review this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer # 1 (Public Review):

      Summary:

      Inthispreprint, theauthorssystematicallyandrigorouslyinvestigatehowspecificclassesofresiduemutations alter the critical temperature as a proxy for the driving forces for phase separation. The work is well executed, the manuscript well-written, and the results reasonable and insightful.

      Strengths:

      The introductory material does an excellent job of being precise in language and ideas while summarizing the state of the art. The simulation design, execution, and analysis are exceptional and set the standard for these types of large-scale simulation studies. The results, interpretations, and Discussion are largely nuanced, clear, and well-motivated.

      We thank the reviewer for their assessment of our work and for highlighting the key strengths of the paper.

      Weaknesses:

      This is not exactly a weakness, but I think it would future-proof the authors’ conclusions to clarify a few key caveats associated with this work. Most notably, given the underlying implementation of the Mpipi model, temperature dependencies for intermolecular interactions driven by solvent effects (e.g., hydrophobic effect and charge-mediated interactions facilitated by desolvation penalties) are not captured. This itself is not a “weakness” per se, but it means I would imagine CERTAIN types of features would not be wellcaptured; notably, my expectation is that at higher temperatures, proline-rich sequences drive intermolecular interactions, but at lower temperatures, they do not. This is likely also true for the aliphatic residues, although these are found less frequently in IDRs. As such, it may be worth the authors explicitly discussing.

      We also thank the reviewer for pointing out that a more detailed discussion of the model limitations is needed. The original Mpipi model was designed to probe UCST-type transitions (that are associative in nature) of disordered sequences. The reviewer is correct, that in its current form, the model does not capture LCST-type transitions that depend on changes in solvation of hydrophobic residues with temperature. We have amended the discussion to highlight this fact.

      Similarly, prior work has established the importance of an alpha-helical region in TDP-43, as well as the role of aliphatic residues in driving TDP-43’s assembly (see Schmidt et al 2019). I recognize the authors have focussed here on a specific set of mutations, so it may be worth (in the Discussion) mentioning [1] what impact, if any, they expect transient or persistent secondary structure to have on their conclusions and [2] how they expect aliphatic residues to contribute. These can and probably should be speculative as opposed to definitive.

      Again - these are not raised as weaknesses in terms of this work, but the fact they are not discussed is a minor weakness, and the preprint’s use and impact would be improved on such a discussion.

      We agree with the reviewer that the effects of structural changes/propensities on these scaling behaviors would be an interesting and important angle to probe. We also comment on this in the discussion.

      Reviewer # 2 (Public Review):

      This is an interesting manuscript where a CA-only CG model (Mpipi) was used to examine the critical temperature (Tc) of phase separation of a set of 140 variants of prion-like low complexity domains (PLDs). The key result is that Tc of these PLDs seems to have a linear dependence on substitutions of various sticker and space residues. This is potentially useful for estimating the Tc shift when making novel mutations of a PLD. However, I have strong reservations about the significance of this observation as well as some aspects of the technical detail and writing of the manuscript.

      We thank the reviewer for their thoughtful and detailed feedback on the manuscript.

      (1) Writing of the manuscript: The manuscript can be significantly shortened with more concise discussions. The current text reads as very wordy in places. It even appears that the authors may be trying a bit too hard to make a big deal out of the observed linear dependence.

      The manuscript needs to be toned done to minimize self-promotion throughout the text. Some of the glaring examples include the wording “unprecedented”, “our research marks a significant milestone in the field of computational studies of protein phase behavior ..”, “Our work explores a new framework to describe, quantitatively, the phase behavior ...”, and others.

      We thank the reviewer for their suggestions on the writing of the manuscript. We understand the concern regarding the length and tone of the manuscript, and in response to their feedback, we have revised the language throughout the manuscript.

      There is really little need to emphasize the need to manage a large number of simulations for all 140 variants. Yes, some thoughts need to go into designing and managing the jobs and organizing the data, but it is pretty standard in computational studies. For example, large-scale protein ligand-free energy calculations can require one to a few orders of magnitude larger number of runs, and it is pretty routine.

      We fully agree with the reviewer that this aspect of the study is relatively standard in computational research and does not require special emphasis. In response, we have revised the manuscript to shorten the aforementioned section, focusing instead on the scientific insights gained from the simulations rather than the logistical challenges of managing them.

      When discussing the agreement with experimental results on Tm, it should be noted that the values of R > 0.93 and RMSD < 14 K are based on only 16 data points. I am not sure that one should refer to this as “extended validation”. It is more like a limited validation given the small data size.

      We thank the reviewer for their consideration of our validation set. Indeed, the agreement with experimental results is based on 16 data points, as this set represents the available published data at the time of writing of this manuscript. The term “extended validation” is used to signify that our current dataset builds upon previous validations (in Joseph, Reinhardt et al. Nat Comput. Sci. 2021), incorporating additional variants not previously examined. The metrics of an r>0.93 and a low RMSD indicate a strong agreement between the model and experiments, and an improvement with respect to other reported models. We are committed to continue validating our methods.

      Results of linear fitting shown in Eq 4-12 should be summarized in a single table instead of scattering across multiple pages.

      We considered the reviewer’s suggestion to compile all the laws into a single table. However, we believe it would be more effective for readers to reference each relationship directly where it is first discussed in the text. That said, we do include Table 1 in the original manuscript, which provides a summary of all the laws.

      The title may also be toned down a bit given the limited significance of the observed linear dependence.

      We respectfully disagree with the reviewer and believe that the current title accurately captures the scope of the manuscript.

      (2) Significance and reliability of Tc: Given the simplicity of Mpipi (a CA-only model that can only describe polymerchaindimension)andthelowcomplexitynatureofPLDs, thesequencecompositionitselfisexpected to be the key determinant of Tc. This is also reflected in various mean-field theories. It is well known that other factors will contribute, such as patterning (examined in this work as well), residual structures, and conformational preferences in dilute and dense phases. The observed roughly linear dependence is a nice confirmation but really unsurprising by itself. It appears how many of the constructs deviate from the expected linear dependence (e.g., Figure 4A) may be more interesting to explore.

      While linear dependencies in critical solution temperatures may appear expected for certain systems, for example, symmetric hard spheres, the heterogeneity of intrinsically disordered regions (IDRs), like prion-like domains (PLDs), make this finding notable. The simplicity of our linear scaling law belies the underlying complexity of multivalent interactions and sequence-dependent behaviors in a certain sequence regime, which has not been quantitatively characterized in this manner before. Likewise, although linear dependencies may be expected in simplified models, the real-world applicability and empirical validation of these laws in biologically relevant systems are not guaranteed. Our chemically based model provides the robustness needed to do that. The linear relationship observed is significant because it provides a predictive framework for understanding how specific mutations affect a diverse set of PLDs. The framework presented can be extended to other protein families upon the application of a validated model, which might or might not yield linear relationships depending on the cooperative effects of their collective behavior. This extends beyond confirming known theories—it offers a practical tool for predicting phase behavior based on sequence composition

      We agree with the reviewer that, while the overarching linear trend is clear, deviations from linearity observed in constructs like those in Figure 4A point to additional, and interesting, layers of complexity. These deviations offer interesting avenues for future research and suggest that while linearity might dominate PLD critical behavior, other factors may modulate this behavior under specific conditions.

      This is an excellent suggestion from the reviewer that, while it falls outside the scope of the current study, we are interested in exploring in the future.

      Finally, the relationships are all linear, they have been normalized in different ways—the strength of the study also lies in that. Instead of focusing solely on linearity, our study explores the physical mechanisms that underlie these relationships. This approach provides a more complete understanding of how sequence composition and the underlying chemistry of the mutated residues influence T<sub>c</sub.

      The assumption that all systems investigated here belong to the same universality class as a 3D Ising model and the use of Eqn 20 and 21 to derive Tc is poorly justified. Several papers have discussed this issue, e.g., see Pappu Chem Rev 2023 and others. Muthukumar and coworkers further showed that the scaling of the relevant order parameters, including the conserved order parameter, does not follow the 3D Ising model. More appropriate theoretical models including various mean field theories can be used to derive binodal from their data, such as using Rohit Pappu’s FIREBALL toolset. Imposing the physics of the 3D Ising model as done in the current work creates challenges for equivalence relationships that are likely unjustified.

      We thank the reviewer for raising this point and for highlighting the FIREBALL toolset. Based on our understanding, FIREBALL is designed to fit phase diagrams using mean-field theories, such as Flory–Huggins and Gaussian Cluster Theory. Our experience with this toolset suggests that it places a higher weight on the dilute arm of the binodal. However, in our slab simulations, we observe greater uncertainty in the density of the dilute arm. This leads to only a moderate fit of the data to the mean-field theories employed in the toolset. While we agree that there is no reason to assume the phase behavior of these systems is fully captured by the 3D Ising model, we expect that such a model will describe the behavior near the critical point better than mean-field theories. Testing our results further with different critical exponents would be valuable in assessing how these predictions compare to a broader set of experimental data. Additionally, we have made the raw data points for the phase diagrams available on our GitHub, enabling practitioners to apply alternative fitting methods.

      While it has been a common practice to extract Tc when fitting the coexistence densities, it is not a parameter that is directly relevant physiologically. Instead, Csat would be much more relevant to think about if phase separation could occur in cells.

      WhileitistruethatCsatisdirectlyrelevanttowhetherphaseseparationcanoccurincellsunder physiological conditions, T<sub>c</sub> should not be dismissed as irrelevant.T<sub>c</sub> provides fundamental insights into the thermodynamics of phase separation, reflecting the overall stability and strength of interactions driving condensate formation. This stability is crucial for understanding how environmental factors, such as temperature or mutations, might affect phase behavior. In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when consideringC<sub>sat</sub> at a fixed temperature.

      Reviewer # 3 (Public Review):

      Summary:

      “Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws” by Maristany et al. offers a significant contribution to the understanding of phase separation in prion-like domains (PLDs). The study investigates the phase separation behavior of PLDs, which are intrinsically disordered regions within proteins that have a propensity to undergo liquid-liquid phase separation (LLPS). This phenomenon is crucial in forming biomolecular condensates, which play essential roles in cellular organization and function. The authors employ a data-driven approach to establish predictive scaling laws that describe the phase behavior of these domains.

      Strengths:

      The study benefits from a robust dataset encompassing a wide range of PLDs, which enhances the generalizability of the findings. The authors’ meticulous curation and analysis of this data add to the study’s robustness. The scaling laws derived from the data provide predictive insights into the phase behavior of PLDs, which can be useful in the future for the design of synthetic biomolecular condensates.

      We thank the reviewer for highlighting the importance of our work and for their critical feedback.

      Weaknesses:

      While the data-driven approach is powerful, the study could benefit from more experimental validation. Experimental studies confirming the predictions of the scaling laws would strengthen the conclusions. For example, in Figure 1, the Tc of TDP-43 is below 300 K even though it can undergo LLPS under standard conditions. Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable.

      In the manuscript, we have leveraged existing experimental data for the A1-LCD variants, extracting critical temperatures and saturation concentrations to compare with our model and scaling law predictions. We acknowledge that a larger set of experiments would be beneficial. By selecting sequences that are related, we hypothesize that the scaling laws described herein should remain robust. In the case of TDP-43, to our knowledge this protein does not phase separate on its own under standard conditions. In vitro experiments that report phase separation at/above 300 K involve either the use of crowding agents (such as dextran or PEG) or multicomponent mixtures that include RNA or other proteins. Therefore, our predictions for TDP-43 are consistent with experiments. In general, we hope that the scaling laws presented in our work will inspire other researchers to further test their validity.

      The authors may wish to consider checking if the scaling behavior is only observed for Tc or if other experimentally relevant quantities such as Csat also show similar behavior. Additionally, providing more intuitive explanations could make the findings more broadly accessible.

      In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when considering C<sub>sat</sub> at a fixed temperature.

      The study focuses on a particular subset of intrinsically disordered regions. While this is necessary for depth, it may limit the applicability of the findings to other types of phase-separating biomolecules. The authors may wish to discuss why this is not a concern. Some statements in the paper may require careful evaluation for general applicability, and I encourage the authors to exercise caution while making general conclusions. For example, “Therefore, our results reveal that it is almost twice more destabilizing to mutate Arg to Lys than to replace Arg with any uncharged, non-aromatic amino acid...” This may not be true if the protein has a lot of negative charges.

      A significant number of proteins, in addition to those mentioned in the manuscript, that contain prion-like low complexity domains have been reported to exhibit phase separation behaviors and/or are constituents of condensates inside cells. We therefore expect these laws to be applicable to such systems and have further revised the text to emphasize this point. As the reviewer suggests, we have also clarified that the reported scaling of various mutations applies to these systems.

      I am surprised that a quarter of a million CPU hours are described as staggering in terms of computational requirements.

      We have removed the note on CPU hours from the manuscript. However, we would like to clarify that the amount of CPU hours was incorrectly reported. The correct estimate is 1.25 million hours, but this value was unfortunately misrepresented during the editing process. We thank the reviewer for catching this mistake on our part.

      Reviewer # 1 (Recommendations For The Authors):

      Some minor points here:

      “illustrating that IDPs indeed behave like a polymer in a good solvent [43]. ” Whether or not an IDP depends as a polymer in a good solvent depends on the amino acid sequence - the referenced paper selected a set of sequences that do indeed appear on average to map to a good-solvent-like polymer, but lest we forget SAXS experiments require high protein concentrations and until the recent advent of SEC-SAXS, your protein essentially needed to be near infinitely soluble to be measured. As such, this paper’s conclusions are, apparently, ignorant of the limitations associated with the data they are describing, drawing sweeping generalizations that are clearly not supported by a multitude of studies in which sequence-dependencies have led to ensembles with a scaling exponent far below 0.59 (See Riback et al 2017, Peng et al 2019, Martin et al 2020, etc).

      We thank the reviewer for raising this point. To avoid making incorrect generalizations and potentially misleading readers, we have removed the quoted statement from our manuscript.

      As of right now, the sequences are provided in a convenient multiple-sequence alignment figure. However, it would be important also to provide all sequences in an Excel table to make it easy for folks to compare.

      In addition to the sequence alignment figure, we now provide all tested sequences in an Excel table format in the GitHub repository.

      Maybe I’m missing it, but it would be extremely valuable if the coexistence points plot in all the figures were provided as so-called source data; this could just be on the GitHub repository, but I’m envisaging a scenario where for each sequence you have a 4 column file where Col1=concentration and Col2=temperature, col3=fit concentration and col4=fit temperature, such that someone could plot col1 vs. col2 and col3 vs. col4 and reproduce the binodals in the various figures. Given the tremendous amount of work done to achieve binodals:

      The coexistence points used to plot the figures are now provided in the GitHub, in a format similar to that suggested by the reviewer.

      It would be nice to visually show how finite size effects are considered/tested for (which they are very nicely) because I think this is something the simulation field should be thinking about more than they are.

      Thank you for highlighting this point. In our previous work (supporting information of the original Mpipi paper), we demonstrated a thorough approach by varying both the cross-sectional area of the box and the long axis while keeping the overall density constant. In this work, we verified that the cross-sectional area was larger than the average R<sub>g</sub> of the protein. We then maintained a fixed cross-sectional area to long-axis ratio, varying the number of proteins while keeping the overall density constant. We have updated Appendix 1–Figure 2 to clarify our procedure and revised the caption to better explain how we ensured the number of proteins was adequate.

      When explaining the law of reticular diameters, it would be good to explain where the 3.06 exponent comes from.

      Based on the reviewer’s suggestion, we have added to the text: “The constant 3.06 in the equation is a dimensionless empirical factor that was derived from simulations of the 3D Ising model.”

      The NCPR scale in Figure 5 being viridis is not super intuitive and may benefit from being seismic or some other r-w-b colormap just to make it easier for a reader to map the color to meaning.

      We thank the reviewer for this suggestion and have replaced the scale with a r-w-b colormap.

      The “sticker and spacer” framework has received critiques recently given its perceived simplicity. However, this work seems to clearly illustrate that certain types of residues have a large effect on Tc when mutated, whereas others have a smaller effect. It may be worth re-phrasing the sticker-spacer introduction not as “everyone knows aromatic/arginine residues are stickers” but as “aromatic and arginine residues have been proposed to be stickers, yet other groups have argued all residues matter equally” and then go on to make the point that while a black-and-white delineation is probably not appropriate, based on the data, certain residues ARE demonstrably more impactful on Tc than others, which is the definition of stickers. With this in mind, it may be useful to separate out a sticker and a spacer distribution in Figure 1D, because the different distribution between the two residues types is not particularly obvious from the overlapping points.

      We have revised the introduction of the sticker–spacer model in the manuscript for clarity. As the reviewer suggests, we have also separated the sticker and spacer distribution, which is now summarized in new Appendix 0–figure 8.

      Reviewer # 3 (Recommendations For The Authors):

      Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable. The following sentence may be revised to reflect this: “Our extended validation set confirms that the Mpipi potential can ...”

      Based on the reviewer’s suggestion, we have revised the text: “Our validation set, which expands the range of proteins variants originally tested [32], highlights that the Mpipi potential can effectively capture the thermodynamic behavior of a wide range of hnRNPA1-PLD variants, and suggests that Mpipi is adequate for proteins with similar sequence compositions, as in the set of proteins analyzed in this study. In recent work by others [66], Mpipi was tested against experimental radius of gyration data for 137 disordered proteins and the model produced highly accurate results, which further suggests the applicability of the approach to a broad range of sequences.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive.

      This is not true; positive CE does not require positive RY deviations of all species. CE is positive as long as average RY deviation is greater than 0. In a 2-species mixture, for example, if the RY deviation of one species is -0.2 and that of the other species is +0.3, CE would be still positive. Positive CE can be associated with negative NE (net biodiversity effects) when more productivity species have smaller negative RY deviation compared to positive RY deviation of less productive species. Therefore, the suggestion by the reviewer “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0)” is not correct.   

      When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The use of word “mitigate” indicates that the effects of niche complementarity and competition are in opposite directions, which is not true with biodiversity experiments based on replacement design. We have explained this in detail in our first responses to reviewers.    

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      Agree. However, If CE and SE are not meant to be biological mechanisms, as suggested by the reviewer, the argument “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0)” would be invalid.  

      Lines 108-123 are not on our method.   

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      True. Research findings indicate that biodiversity effect detected with AP is not constant.    

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche.

      Competitive ability is not necessarily associated with species niche space. Both generalist and specialist species can be more productive at a particular study site, as long as they are more capable of obtaining resources from a local pool. Remember, biodiversity experiments are conducted at a site of particular conditions, not across a range of species niche space at landscape level.

      Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As explained in lines 370-376, the mathematical form is a linear approximation as the relationship between competitive growth responses and species relative competitive ability is generally unknow but would be likely nonlinear. Once the relationship is determined in future research, the scaling factor is not needed.    

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      Comments on revised version:

      Only minimal changes were made to the manuscript, and they do not address the main points that were raised.

      Reviewer #2 (Public review):

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      No responses.

      Comments on revised version:

      The authors changed only one minor detail in response to the last round of reviews.

      Reviewer #3 (Public review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript's null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      Our approach adopts two hypotheses, null hypothesis that is also with the additive partitioning model and competitive hypothesis that is new. Null hypothesis assumes that inter- and intra-specie interactions are the same, while competitive hypothesis assumes that species differ in competitive ability and growth rate. Therefore, our approach is an extension of current approach. Our approach separates effects of competitive interactions from those of other species interactions, while the current approach does not.      

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning.

      We have explained in our first responses that competition and biodiversity effects are studied in different experimental approaches, i.e., additive and replacement designs. Results from one approach are not compatible with those from the other. For example, competition effect with additive design is negative but generally positive with replacement design that is used extensively in biodiversity experiments. We have considered species competitive ability, density-growth relationship, and different effects of competitive interactions between additive and replacement design, while the current method does not reflect any of those.        

      The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them.

      We used simulation data, as partial density monocultures are generally not available in previous biodiversity experiments.

      Finally, it is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others.

      Our null hypothesis is the same as the null hypothesis with the additive partitioning assuming that inter- and intra-species interactions are the same, while our competitive hypothesis assumes that species differ in competitive ability and growth rate. Rejecting null hypothesis means that inter- and intra-species interactions are different, whereas rejecting competitive hypothesis indicates existence of positive/negative species interactions. This would be interesting to everyone.       

      Comments on revised version:

      Please see review comments on the previous version of this manuscript. The authors have not revised their manuscript to address most of the issues previously raised by reviewers.

      No responses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Do take reviews seriously. Even if you think the reviewers all are wrong and did not understand your work, then this seems to indicate that it was not clearly presented.

      Reviewer #2 (Recommendations for the authors):

      I can understand that the authors are perhaps frustrated with what they perceive as a basic misunderstanding of their goals and approach. This misunderstanding however, provides with it an opportunity to clarify. I believe that the authors have tried to clarify in rebutting our statements but would do better to clarify in the manuscript itself. If we reviewers, who are deeply invested in this field, don't understand the approach and its value, then it is likely that many readers will not as well.

      The additive partitioning has been publicly questioned at least for serval times since the conception of the method in 2001. Our work provides an alternative.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer (1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer 2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer 3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1 Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size

      The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns

      While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:

      (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We will add the confidence intervals to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9),and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we will change Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We will highlight these motivations more clearly in the Methods of the revised manuscript.

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “3.4 Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Response: Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Response: Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.“

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As mentioned in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We will make this clearer in the Methods of the revised manuscript.

      Our consistent results of group differences across all three  EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Figure R1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      Response: The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).“

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      Response: The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “3.5 Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      Response: As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      Response: The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).<br /> - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      Response: We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      Response: In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).“

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the end of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies that relate MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:<br /> Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 2.5 years.

      We took care of the validity of our results with two measures; first, we assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 28 additional individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022). In the revised manuscript, we more explicitly inform the reader about this data quality difference between regions in the Methods (Pages 11-12, MRS Data Quality/Table 2) and Discussion (Page 25, Lines 644- 647).

      Importantly, while in the present study data quality differed between the frontal and visual cortex voxel, it did not differ between groups (Supplementary Material S6).  

      Further, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we added the recently published MRS quality assessment form to the supplementary materials (Supplementary Excel File S1). Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel. Finally, EEG data quality did not differ between frontal and occipital electrodes; therefore, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we have more clearly indicated that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject (Page 23, Lines 609-613).

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences. 

      In the revised manuscript, we discussed the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable (Page 5, Lines 143 – 145, Lines 147-149). 

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we have clearly indicated that the exploratory correlation analyses are reported to put forth hypotheses for future studies (Page 4, Lines 118-128; Page 5, Lines 132-134; Page 25, Lines 644- 647).

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to be predominantly driven by an effect of chronological age.

      In the revised manuscript, we added the linear regressions with age as a covariate (Supplementary Material S16, referred to in the main Results, Page 21, Lines 534-537), demonstrating the significant relationship between aperiodic intercept and Glx concentration in the CC group. 

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we improved the phrasing (Page 5, Lines 130-132) and consistently reported the correlations as exploratory in the Methods and Discussion. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we added this analysis to the Supplementary Material (Supplementary Material S14) and referred to it in our Results (Page 20, Lines 513-514).

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration.

      We have now explicitly stated this in the Limitations section (Page 25, Lines 654-655).

      However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature (Page 23, Lines 609-611).

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz). 

      As stated in the Discussion section and Response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we have more clearly indicated in the Discussion that these are possible post-hoc interpretations (Page 23, Lines 584-586; Page 24, Lines 609-620; Page 24, Lines 644-647; Pages 25, Lines 650 - 657). We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such.

      We have now made this clear in all the relevant parts of the manuscript (Introduction, Page 5, Lines 132-135; Methods, Page 16, Line 415; Results, Page 21, Figure 4; Discussion, Page 22, Line 568, Page 25, Lines 644-645, Page 25, Lines 650-657).

      The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we have detailed the advantages (Methods, Page 5, Lines 143 – 145, Lines 147-149; Discussion, Page 26, Lines 677-678) and disadvantages (Discussion, Page 25, Lines 650-657) of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our Discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. In the revised manuscript, we added the sample sizes of previous studies using MRS in permanently blind individuals (Page 4, Lines 108 - 109). It is worth noting that our EEG results fully align with those of larger samples of congenital cataract reversal individuals (Page 25, Lines 666-676, Supplementary Material S18, S19) (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      In the revised manuscript, we have more clearly marked the correlation analyses as exploratory (Introduction, Page 4, Lines 118-128 and Page 5, Lines 132-134; Methods Page 16, Line 415; Discussion Page 22, Line 568, Page 24, Lines 644-645, Page 25, Lines 650-657); note that we do not base most of our discussion on the results of these analyses.

      As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot. Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights. 

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      We have now clarified the motivation for these conditions in the Introduction (Page 4, Lines 122-125) and the Methods (Page 9, Lines 219-224).

      In the revised manuscript, we added the rationale for parametric analyses for our outcomes (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9). Note that in the Supplementary Materials (S12, S14), we have reported the correlations between visual history metrics and MRS/EEG outcomes, thereby investigating whether the variance in visual history might have driven these results. Specifically, we found a (negative) correlation between visual cortex Glx/GABA+ concentration during eye closure and the visual acuity in the CC group (Figure 2c). None of the other exploratory correlations between MRS/EEG outcomes vs time since surgery, duration of blindness or visual acuity were significant in the CC group (Supplementary Material S12, S15).  

      The alpha level used for the ANOVA models specified in the Methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the Methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age, recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition. The ANOVA conducted on the EEG metrics was 2x3 because it had two groups (CC, SC) and three conditions (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the Methods and Figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and have uploaded the consistency report with the revised Supplementary Material (Supplementary Report 1).

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we have cited those studies not already included in the Introduction (Page 3, Lines 92-94).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects.

      This is now mentioned in the Methods, Page 13, Line 344.

      There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. Note that Ossandón et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range.

      In the revised Discussion, we removed this section. We primarily interpret the increased offset and prior findings from fMRI-BOLD data (Raczy et al., 2023) as an increase in broadband neuronal firing.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning (Page 9, Lines 229-237). We now address this explicitly in the Methods in the “MRS Data Quality” section. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey (Oeltzschner et al., 2020), which was released in 2020 and uses linear combination modeling to fit the peak, as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited-spectrum analysis toolbox at the time, and still is widely used.

      In the revised manuscript, we re-analyzed the data using linear combination modeling with Osprey (Oeltzschner et al., 2020), and reported that the main findings remained the same, i.e. the Glx/GABA+ concentration ratio was lower in the visual cortex of congenital cataract reversal individuals compared to normally sighted controls, regardless of whether participants were scanned with eyes open or with eyes closed. Further, NAA concentration did not differ between groups (Supplementary Material S3). Thus, we demonstrate that our findings were robust to analysis pipelines, and state this in the Methods (Page 9, Lines 242-246) and Results (Page 19, Lines 464-467).

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript, we have removed the statement regarding stability and the associated section.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we have rewritten the Discussion and removed this section.   

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and Reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We have indicated clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the Discussion as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revised manuscript, we have checked that speculations are clearly marked, and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023). In the revised manuscript, we have rephrased the statement as “to provide initial evidence” (Page 22, Line 676).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript, we rephrased the sentence and added “might imply” to better indicate the hypothetical character of this idea (Page 22, Lines 586-587).

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we added a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandón et al (Supplementary Material S18). We adapted the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandón et al. (2023) (Page 25, Lines 671-672).

      References (Public Review)

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for The Authors):

      Thank you for the interesting submission. I have inserted my comments to the authors here. Some of them will be more granular comments related to the concerns raised in the public review.

      (1) Introduction:

      Could you please justify the rationale for using eyes open and eyes closed in the MRS condition, and the use of the three different conditions in the EEG experiment? If these resulted in negative findings, then the implications should be discussed.

      Previous work with MRS in sighted individuals has suggested that eye opening in darkness results in a decrease of visual cortex GABA+ concentration, while visual stimulation results in an increase of Glx concentration, compared to a baseline concentration at eye closure (Kurcyus et al., 2018). Moreover visual stimulation/eye opening is known to result in an alpha desynchronization (Adrian & Matthews, 1934).

      While previous work of our group has shown significantly reduced alpha oscillatory activity in congenital cataract reversal individual, desynchronization following eye opening was indistinguishable when compared to normally sighted controls (Ossandón et al., 2023; Pant et al., 2023).

      Thus, we decided to include both conditions to test whether a similar pattern of results would emerge for GABA+/Glx concentration.

      We added our motivation to the Introduction of the revised manuscript (Page 4, Lines 122-125) along with the Methods (Page 9, Lines 219-223).

      It does not become clear from the introduction why a higher intercept is predicted in the EEG measure. The rationale for this hypothesis needs to be explained better.

      Given the prior findings suggesting an increased E/I ratio in CC individuals and the proposed link between neuronal firing (Manning et al., 2009) and the aperiodic intercept, we expected a higher intercept for the CC compared to the SC group.

      We have now added this explanation to the Introduction (Page 4, Lines 126-128).

      (2) Participants

      Were participants screened for common MRS exclusion criteria such as history of psychiatric conditions or antidepressant medication, which could alter neurochemistry? If not, then this needs to be pointed out.

      All participants were clinically screened at the LV Prasad Eye Institute, and additionally self-reported no neurological or psychiatric conditions or medications. Moreover, all subjects were screened based exclusion criteria for being scanned using the standard questionnaire of the radiology center.

      We have now made this clear in the Methods (Page 7, Lines 168-171).

      Table 1 needs to show the age of the participant, which can only be derived by adding the columns 'duration of deprivation' and 'time since surgery'. Table 1 also needs to include the controls.

      We have accordingly modified Table 1 in the revised manuscript and added age for the patients as well as the controls (Table 1, Pages 6-7).

      The control cohort is not specific enough to exclude reduced visual acuity, or co-morbidities, as the primary driver of the differences between groups. Ideally, a cohort with developmental cataracts is recruited. Normally sighted participants as a control cohort cannot distinguish between different types of sight loss, or stages of plasticity.

      The goal of this study was not to distinguish between different types of sight loss or stages of plasticity. We aimed to assess whether the most extreme forms of visual deprivation (i.e. congenital and total patterned vision loss) affected the E/I ratio. Low visual acuity and nystagmus are genuine diagnostic criteria (Methods, Page 5, Lines 142-145). Visual acuity cannot solely explain the current findings, since the MRS data were acquired both with eyes closed or diffuse visual stimulation in a dimly lit room, without any visual task.

      With the awareness of the present results, we consider it worthwhile for the future to investigate additional groups such as developmental cataract-reversal individuals, to narrow down the contribution of the age of onset and degree of visual deprivation to the observed group differences.

      (3) Data collection and analysis

      - More detail is needed: how long were the sessions, how long was each part?

      We have added this information on Page 7, Lines 178-181 of the Methods. MRS scanning took between 45 and 60 minutes, EEG testing took 20 minutes excluding the time for capping, and visual acuity testing took 3-5 minutes.

      - It should be mentioned here that the EEG data is a reanalysis of a subset of legacy data, published previously in Ossandón et al., 2023; Pant et al., 2023.

      In the revised manuscript, we explicitly state at the beginning of the “Electrophysiology recordings” section of the Methods (Page 13, Lines 331-334) that the EEG datasets were a subset of previously published data.

      (4) MRS Spectroscopy

      - Please fill out the minimum reporting standards form (Lin et al., 2021), or report all the requested measures in the main document https://pubmed.ncbi.nlm.nih.gov/33559967/

      We have now filled out this form and added it as Supplementary Material (Supplementary Excel File 1). Additionally, all the requested information has been moved to the Methods section of the main document (MRS Data Quality, Pages 10-12).

      - Information on how the voxels were placed is missing. The visual cortex voxel is not angled parallel to the calcarine, as is a common way to capture processing in the early visual cortex. Describe in the paper what the criteria for successful placement were, and how was it ensured that non-brain tissue was avoided in a voxel of this size.

      Voxel placement was optimized in each subject to avoid the meninges, ventricles, skull and subcortical structures, ensured by examining the voxel region across slices in the acquired T1 volume for each subject. Saturation bands were placed to nullify the skull signal during MRS acquisition, at the anterior (frontal) and posterior (visual) edge of the voxel for every subject. Due to limitations in the clinical scanner rotated/skewed voxels were not possible, and thus voxels were not always located precisely parallel to the calcarine.

      We have added this information to Page 9 (Lines 229-237) of the revised manuscript.

      - Figure 1. shows voxels that are very close to the edge of the brain (frontal cortex) or to the tentorium (visual cortex). Could the authors please calculate the percentage overlap between the visual cortex MRS voxel and the visual cortex, and compare them across groups to ensure that there is no between-group bias from voxel placement?

      We have now added the requested analysis to Supplementary Material S2 and referred to it in the main manuscript on Page 9, Lines 236-237.

      Briefly, the percentage overlap with areas V1-V6 in every individual subject’s visual cortex voxel was 60% or more; the mean overlap in the CC group was 67% and the SC group 70%. The percentage overlap did not differ between groups ( t-test (t(18) = -1.14, p = 0.269)).

      - Figure 1. I would recommend displaying data on a skull-stripped image to avoid identifying information from the participant's T1 profile.

      We have now replaced the images in Figure 1 with skull-stripped images. Note that images from SPM12 were used instead of GannetCoregister, as GannetCoregister only displays images with the skull.

      - Please show more rigor with the MRS quality measures. Several examples of inconsistency and omissions are below.

      • SNR was quantified and shows a difference in SNR between voxel positions, with lower SNR in the frontal cortex. No explanation or discussion of the difference was provided.

      • Looking at S1, the linewidth of NAA seems to be a lot broader in the frontal cortex than in the visual cortex. The figures suggest that acquisition quality was very different between voxel locations, making the comparison difficult.

      • Linewidth of NAA is a generally agreed measure of shim quality in megapress acquisitions (Craven et al., 2022).

      The data quality difference between the frontal and visual cortices has been observed in the literature (Juchem & Graaf, 2017; Rideaux et al., 2022). We nevertheless chose a frontal cortex voxel as control site instead of the often-chosen sensorimotor cortex. The main motivation was to avoid any cortical region linked to sensory processing since crossmodal compensation as a consequence of visual deprivation is a well-documented phenomenon.

      We now make this clearer in the Methods (Page 11, Lines 284 – 299), in the Discussion/Limitations (Page 25, Lines 662 - 665).  

      - To get a handle on the data quality, I would recommend that the authors display their MRS quality measures in a separate section 'MRS quality measure', including NAA linewidth, NAA SNR, GABA+ CRLB, Glx CRLB, and test for the main effects and interaction of voxel location (VC, FC) and group (SC, CC) and discuss any discrepancies.

      We have moved all the quality metric values for GABA+, Glx and NAA from the supplement to the Methods section (see Table 2), and added the requested section titled “MRS Data quality.”

      We have conducted the requested analyses and reported them in Supplementary Material S6: there was a strong effect of region confirming that data quality was better in the visual than frontal region. We have referred to this in the main manuscript on Page 11, Line 299.

      In the revised manuscript, we discuss the data quality in the frontal cortex, and how we ensured it was comparable to prior work. Moreover, there were no significant group effects, or group-by-region interactions, suggesting that group differences observed for the visual cortex voxel cannot be accounted for by differences in data quality. We now included a section on data quality, both in the Methods (Page 11, Lines 284 – 299), and the limitations section of the Discussion (Page 25, Lines 662 - 665).

      Please clarify the MRS acquisition, "Each MEGA- PRESS scan lasted for 8 minutes and was acquired with the following specifications: TR = 2000 ms, TE = 68 ms, Voxel size = 40 mm x 30 mm x 25mm, 192 averages (each consists of two TRs). "192 averages x 2 TRs x 2s TR = 12.8 min, not 8 min, apologies if I have misunderstood these details.

      We have corrected this error in the revised manuscript and stated the parameters more clearly – there were a total of 256 averages, resulting in an (256 repetitions with 1 TR * 2 s/60) 8.5-minute scan (Page 8, Lines 212-213).

      - What was presented to participants in the eyes open MRS? Was it just normal room illumination or was it completely dark? Please add details to your methods.

      The scans were conducted in regular room illumination, with no visual stimulation.

      We have now clarified this on Page 9 (Lines 223-224) of the Methods.

      (5) MRS analysis

      How was the tissue fraction correction performed? Please add or refer to the exact equation from Harris et al., 2015.

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      (6) Statistical approach

      How was the sample size determined? Please add your justification for the sample size

      We collected as many qualifying patients as we were able to recruit for this study within 2.5 years of data collection (commencing August 2019, ending February 2022), given the constraints of the patient population and the pandemic. We have now made this clear in the Discussion (Page 25, Lines 650-652).

      Please report the tests for normality.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the descriptions of the statistical analyses (Methods, Page13, Lines 326-329 and Page 15, Lines 400-402).

      Calculate the Bayes Factor where possible.

      As our analyses are all frequentist, instead of re-analyzing the data within a Bayesian framework, we added partial eta squared values for all the reported ANOVAs (η<sub>p</sub><sup>²</sup>) for readers to get an idea of the effect size (Results).

      I recommend partial correlations to control for the influence of age, duration, and time of surgery, rather than separate correlations.

      Given the combination of small sample size and the expected multicollinearity in our variables (duration of blindness, for example, would be expected to correlate with age, as well as visual acuity post-surgery), partial correlations could not be calculated on this data.

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we had exploratorily planned to relate behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Introduction (Page 5, Lines 133-135), Methods (Page 15, Lines 408-410) and Discussion section (Page 24, Line 634, Page 25, Lines 652-65) to ensure that the results are interpreted with appropriate caution.

      (7) Visual acuity

      Is the VA monocular average, from the dominant eye, or bilateral?

      We have now clarified that the VA reported here is bilateral (Methods, Page 7 Line 165 and Page 15, Line 405). Bilateral visual acuity in congenital cataract-reversal individuals typically corresponds to the visual acuity of the best eye.

      It is mentioned here that correlations with VA are exploratory, please be consistent as the introduction mentions that there was a hypothesis that you sought to test.

      We have now accordingly modified the Introduction (Page 5, Lines 133-135) and added the appropriate caveats in the discussion with regards to interpretations (Page 25, Lines 652-665).

      (8) Correlation analyses between MRS and EEG

      It is mentioned here that correlations between EEG and MRS are exploratory, please consistently point out the exploratory nature, as these results are preliminary and should not be overinterpreted ("We did not have prior hypotheses as to the best of our knowledge no extant literature has tested the correlation between aperiodic EEG activity and MRS measures of GABA+,Glx and Glx/GABA+." ).

      In the revised manuscript, we explicitly state the reported associations between EEG (aperiodic component) and MRS parameters allow for putting forward directed / more specific hypotheses for future studies (Introduction, Page 5, Lines 133-135; Methods, Page 15, Line 415. Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (9) Results

      Figure 2 uses the same y-axis for the visual cortex and frontal cortex to facilitate a comparison between the two locations. Comparing Figure 2 a with b demonstrates poorer spectral peaks and reduced amplitudes. Lower spectral quality in the frontal cortex voxel could contribute to the absence of a group effect in the control voxel location. The major caveat that spectral quality differs between voxels needs to be pointed out and the limitations thereof discussed.

      We have now explicitly pointed out this issue in the Methods (MRS Data Quality, Supplementary Material S6) and Discussion in the Limitations section (Page 25, Lines 662-665). While data quality was lower for the frontal compared to the visual cortex voxels, as has been observed previously (Juchem & Graaf, 2017; Rideaux et al., 2022), this was not an issue for the EEG recordings. Thus, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures. Crucially, data quality did not differ between groups.

      The results in 2c are the result of multiple correlations with metabolite values ("As in previous studies, we ran a number of exploratory correlation analyses between GABA+, Glx, and Glx/GABA+ concentrations, and visual acuity at the date of testing, duration of visual deprivation, and time since surgery respectively in the CC group"), it seems at least six for the visual acuity measure (VA vs Glx, VA vs GABA+, VA vs Glx/GABA+ x 2 conditions). While the trends are interesting, they should be interpreted with caution because of the exploratory nature, small sample size, the lack of multiple comparison correction, and the influence of two extreme data points. The authors should not overinterpret these results and should point out the need for replication.

      See response to (6) last section, which we copy here for convenience:

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we exploratorily related behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Discussion section to ensure that the results are interpreted with appropriate caution (Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (10) Discussion:

      Please explain the decrease in E/I balance from MRS in view of recent findings on an increase in E/I balance in CC using RSN-fMRI (Raczy et al., 2022) and EEG (Ossandon et al. 2023).

      We have edited our Abstract (Page 1-2, Lines 31-35) and Discussion (Page 23, Lines 584-590; Page 24, Lines 613-620). In brief, we think our results reflect a homeostatic regulation of E/I balance, that is, an increase in inhibition due to an increase in stimulus driven excitation following sight restoration.

      Names limitations but does nothing to mitigate concerns about spatial specificity. The limitations need to be rewritten to include differences in SNR between the visual cortex and frontal lobe. Needs to include caveats of small samples, including effect inflation.

      We have now discussed the data quality differences between the visual and frontal cortex voxel in MRS data quality, which we find irrespective of group (MRS Data Quality, Supplementary Material S6). We also reiterate why this might not explain our results; data quality was comparable to prior studies which have found group differences in frontal cortex (Methods Page 11, Lines 284 – 299), and data quality did not differ between groups. Further, EEG data quality did not differ across frontal and occipital regions, but group differences in EEG datasets were localized to the occipital cortex.

      Reviewer #2 (Recommendations for The Authors):

      To address the main weakness, the authors could consider including data from a third group, of congenitally blind individuals. Including this would go a very long way towards making the findings interpretable and relating them to the rest of the literature.

      Unfortunately, recruitment of these groups was not possible due to the pandemic. Indeed, we would consider a pre- vs post- surgery approach the most suitable design in the future, which, however, will require several years to be completed. Such time and resource intensive longitudinal studies are justified by the present cross-sectional results.

      We have explicitly stated our contribution and need for future studies in the Limitations section of the Discussion (Page 25, Lines 650-657).

      Analysing the amplitude of alpha rhythms, as well as the other "aperiodic" components, would be useful to relate the profile of the tested patients with previous studies. Visual inspection of Figure 3 suggests that alpha power with eyes closed is not reduced in the patients' group compared to the controls. This would be inconsistent with previous studies (including research from the same group) and it could suggest that the small selected sample is not really representative of the sight-recovery population - certainly one of the most heterogeneous study populations. This further highlights the difficulty of drawing conclusions on the effects of visual experience merely based on this N=10 set of patients.

      Alpha power was indeed reduced in the present subsample of 10 CC individuals (Supplementary Material S19). A possible source of the confusion (that the graphs of the CC and SC group look so similar for the EC condition in Figure 3) likely is that the spectra are shown with aperiodic components not yet removed, and scales to accommodate very different alpha power values. As documented in Supplementary Material S18 and S19, alpha power and the aperiodic intercept/slope results of the resting state data in the present 10 CC individuals correspond to the results from a larger sample of CC individuals (n = 28) in Ossandón et al., 2023. We explicitly highlight this “replication” in the main manuscript (Page 25 -26, Lines 671-676). Thus, the present sub-sample of CC individuals are representative for their population.

      To further characterise the MRS results, the authors may consider an alternative normalisation scheme. It is not clear whether the lack of significant GABA and GLX differences in the face of a significant group difference in the GLX/GABA ratio is due to the former measures being noisier since taking the ratio between two metabolites often helps reduce inter-individual variability and thereby helps revealing group differences. It remains an open question whether the GABA or GLX concentrations would show significant group differences after appropriate normalisation (e.g. NAA?).

      We repeated the analysis with Creatine-normalized values of GABA+ and Glx, and the main results i.e. reduced Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no such difference in the frontal cortex, remained the same (Supplementary Material S5).

      Further, we re-analyzed the data using Osprey, an open-source toolbox that uses linear combination modeling, and found once more that our results did not change (Supplementary Material S3). We refer to these findings in the Methods (Page 10, Lines 272-275) and Results (Page 10, Lines 467-471) of the main manuscript.

      In fact, the Glx concentration in the visual cortex of CC vs SC individuals was significantly decreased when Cr-normalized values were used (which was not significant in the original analysis). However, we do not interpret this result as it was not replicated with the water-normalized values from Gannet or Osprey.

      I suggest revising the discussion to present a more balanced picture of the existent evidence of the relation between E/I and EEG indices. Although there is evidence that the 1/f slope changes across development, in a way that could be consistent with a higher slope reflecting more immature and excitable tissue, the link with cortical E/I is far from established, especially when referring to specific EEG indices (intercept vs. slope, measured in lower vs. higher frequency ranges).

      We have revised the Introduction (Page 4, Line 91, Lines 101-102) and Discussion (Page 22, Lines 568-569, Page 24, Lines 645-647 and Lines 654-657) in the manuscript accordingly; we allude to the fact that the links between cortical E/I and aperiodic EEG indices have not yet been unequivocally established in the literature.

      Minor:

      - The authors estimated NAA concentration with different software than the one used to estimate GLX and GABA; this examined the OFF spectra only; I suggest that the authors consider running their analysis with LCModel, which would allow a straightforward approach to estimate concentrations of all three metabolites from the same edited spectrum and automatically return normalised concentrations as well as water-related ones.

      We re-analyzed all of the MRS datasets using Osprey, which uses linear combination modelling and has shown quantification results similar to LCModel for NAA (Oeltzschner et al., 2020). The results of a lower Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no difference in NAA concentration, were replicated using this pipeline.

      We have now added these analyses to the Supplementary Material S3 and referred to them in the Methods (Page 9, Lines 242-246) and Results (Page 18, Lines 464-467).

      - Of course the normalisation used to estimate GABA and GLX values is completely irrelevant when the two values are expressed as ratio GLX/GABA - this may be reflected in the text ("water normalised GLX/GABA concentration" should read "GLX/GABA concentration" instead).

      We have adapted the text on Page 16 (Line 431) and have ensured that throughout the manuscript the use of “water-normalized” is in reference to Glx or GABA+ concentration, and not the ratio.

      - Please specify which equation was used for tissue correction - is it alpha-correction?

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      - Since ANOVA was used, the assumption is that values are normally distributed. Please report evidence supporting this assumption.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the Methods (Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Reviewer #3 (Recommendations for The Authors):

      In addition to addressing major comments listed in my Public Review, I have the following, more granular comments, which should also be addressed:

      (1) The paper's structure could be improved by presenting visual acuity data before diving into MRS and EEG results to better contextualize the findings.

      We now explicitly state in the Methods (Page 5, Line 155) that lower visual acuity is expected in a cohort of CC individuals with long lasting congenital visual deprivation.

      We have additionally included a plot of visual acuities of the two groups (Supplementary Material S1).

      (2) The paper should better explain the differences between CC for which sight is restored and congenitally blind patients. The authors write in the introduction that there are sensitive periods/epochs during the lifespan for the development of local inhibitory neural circuits. and "Human neuroimaging studies have similarly demonstrated that visual experience during the first weeks and months of life is crucial for the development of visual circuits. If human infants born with dense bilateral cataracts are treated later than a few weeks from birth, they suffer from a permanent reduction of not only visual acuity (Birch et al., 1998; Khanna et al., 2013) and stereovision (Birch et al., 1993; Tytla et al., 1993) but additionally from impairments in higher-level visual functions, such as face perception (Le Grand et al., 2001; Putzar et al., 2010; Röder et al., 2013)...".

      Thus it seems that the current participants (sight restored after a sensitive period) seem to be similarly affected by the development of the local inhibitory circuits as congenitally blind. To assess the effect of plasticity and sight restoration longitudinal data would be necessary.

      In the Introduction (Page 2, Lines 59-64; Page 3, Lines 111-114) we added that in order to identify sensitive periods e.g. for the elaboration of visual neural circuits, sight recovery individuals need to be investigated. The study of permanently blind individuals allows for investigating the role of experience (whether sight is necessary to introduce the maturation of visual neural circuits), but not whether visual input needs to be available at early epochs in life (i.e. whether sight restoration following congenital blindness could nevertheless lead to the development of visual circuits).

      This is indeed the conclusion we make in the Discussion section. We have now highlighted the need for longitudinal assessments in the Discussion (Page 25, Lines 654-656).

      (3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).

      (4) "For this scan, participants were instructed to keep their eyes closed and stay as still as possible." Why should it be important to have the eyes closed during a T1w data acquisition? This statement at this location does not make sense.

      To avoid misunderstandings, we removed this statement in this context.

      (5) "Two SC subjects did not complete the frontal cortex scan for the EO condition and were excluded from the statistical comparisons of frontal cortex neurotransmitter concentrations."<br /> Why did the authors not conduct whole-brain MRS, which seems to be on the market for quite some time (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590062/) ?

      Similar to previous work (Coullon et al., 2015; Weaver et al., 2013) our hypothesis was related to the visual cortex, and we chose the frontal cortex voxel as a control. This has now been clarified in the Introduction (Page 4, Lines 103-114), Methods (Page 9, Lines 225-227) and Discussion (Page 25, Lines 662-665).

      (6) In "....during visual stimulation with stimuli that changed in luminance (LU) (Pant et al., 2023)." the authors should provide a link on the visual stimulation, which is provided further below

      In the revised manuscript, we have moved up the description of the visual stimulation (Page 13, Line 336).

      (7) "During the EO condition, participants were asked to fixate on a blank screen." This is not really possible. Typically, resting state EO conditions include a fixation cross, as the participants would not be able to fixate on a blank screen and move their eyes, which would impact the recordings.

      We have now rephrased this as “look towards” with the goal of avoiding eye movements (Page 14, Line 347).

      (8) "Components corresponding to horizontal or vertical eye movements were identified via visual inspection and removed (Plöchl et al., 2012)." It is unclear what the Plöchl reference should serve for. Is the intention of the authors to state that manual (and subjective) visual inspection of the ICA components is adequate? I would recommend removing this reference.

      The intention was to provide the basis for classification during the visual inspection, as opposed to an automated method such as ICLabel.

      We stated this clearly in the revised manuscript (Page 14 Lines 368-370).

      (9) "The datasets were divided into 6.25 s long epochs corresponding to each trial." This is a bit inaccurate, as the trial also included some motor response task. Thus, I assume the 6.25 s are related to the visual stimulation.

      We have modified the sentence accordingly (Page 15, Line 378).

      (10) Figure 2. a & b. Just an esthetic suggestion: I would recommend removing the lines between the EC and EO conditions, as they suggest some longitudinal changes. Unless it is important to highlight the changes between EC and EO within each subject.

      In fact, EC vs. EO was a within-subject factor with expected changes for the EEG and possible changes in the MRS parameters. To allow the reader to track changes due to EC vs. EO for individual subjects (rather than just comparing the change in the mean scores), we use lines.  

      (11) Figure 3A: I would plot the same y-axis range for both groups to make it more comparable.

      We have changed Figure 3A accordingly.

      (12) " flattening of the intercept" replaces flattening, as it is too related to slope.

      We have replaced “flattening” with “reduction” (Page 20, Line 517).

      (13) The plotting of only the significant correlation between MRS measures and EEG measures seems to be rather selective reporting. For this type of exploratory analysis, I would recommend plotting all of the scatter plots and moving the entire exploratory analysis to the supplementary (as this provides the smallest evidence of the results).

      We have made clear in the Methods (Page 16, Lines 415-426), Results and Discussion (page 24, Lines 644-645), as well as in the Supplementary material, that the reason for only reporting the significant correlation was that this correlation survived correction for multiple comparisons, while all other correlations did not. We additionally explicitly allude to the Supplementary Material where the plots for all correlations are shown (Results, Page 21, Lines 546-552).

      (14) "Here, we speculate that due to limited structural plasticity after a phase of congenital blindness, the neural circuits of CC individuals, which had adapted to blindness after birth, employ available, likely predominantly physiological plasticity mechanisms (Knudsen, 1998; Mower et al., 1985; Röder et al., 2021), in order to re-adapt to the newly available visual excitation following sight restoration."

      I don't understand the logic here. The CC individuals are congenitally blind, thus why should there be any physiological plasticity mechanism to adapt to blindness, if they were blind at birth?

      With “adapt to blindness” we mean adaptation of a brain to an atypical or unexpected condition when taking an evolutionary perspective (i.e. the lack of vision). We have made this clear in the revised manuscript (Introduction, Page 4, Lines 111-114; Discussion, Page 23, Lines 584-591).

      (15) "An overall reduction in Glx/GABA ratio would counteract the aforementioned adaptations to congenital blindness, e.g. a lower threshold for excitation, which might come with the risk of runaway excitation in the presence of restored visually-elicited excitation."

      This could be tested by actually investigating the visual excitation by visual stimulation studies.

      The visual stimulation condition in the EEG experiment of the present study found a higher aperiodic intercept in CC compared to SC individuals. Given the proposed link between the intercept and spontaneous neural firing (Manning et al., 2009), we interpreted the higher intercept in CC individuals as increased broadband neural firing during visual stimulation (Results Figure 3; Discussion Page 24, Lines 635-640). This idea is compatible with enhanced BOLD responses during an EO condition in CC individuals (Raczy et al., 2022). Future work should systematically manipulate visual stimulation to test this idea.

      (16) As the authors also collected T1w images, the hypothesis of increased visual cortex thickness in CC. Was this investigated?

      This hypothesis was investigated in a separate publication which included this subset of participants (Hölig et al., 2023), and found increased visual cortical thickness in the CC group. We refer to this publication, and related work (Feng et al., 2021) in the present manuscript.

      (17) The entire discussion of age should be omitted, as the current data set is too small to assess age effects.

      We have removed this section and just allude to the fact that we replicated typical age trends to underline the validity of the present data (Page 26, Lines 675-676).

      (18) Table1: should include the age and the age at the time point of surgery.

      We added age to the revised Table 1. We clarified that in CC individuals, duration of blindness is the same as age at the time point of surgery (Page 6, Line 163).

      (19) Why no group comparisons of visual acuity are reported?

      Lower visual acuity in CC than SC individuals is a well-documented fact.

      We have now added the visual acuity plots for readers (Supplementary Material S1, referred to in the Methods, Page 5, Line 155) which highlight this common finding.

      References (Recommendations to the Authors)

      Adrian, E. D., & Matthews, B. H. C. (1934). The berger rhythm: Potential changes from the occipital lobes in man. Brain. https://doi.org/10.1093/brain/57.4.355

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Feng, Y., Collignon, O., Maurer, D., Yao, K., & Gao, X. (2021). Brief postnatal visual deprivation triggers long-lasting interactive structural and functional reorganization of the human cortex. Frontiers in Medicine, 8, 752021. https://doi.org/10.3389/FMED.2021.752021/BIBTEX

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Hölig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Röder, B. (2023). Sight restoration in congenitally blind humans does not restore visual brain structure. Cerebral Cortex, 33(5), 2152–2161. https://doi.org/10.1093/CERCOR/BHAC197

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Raczy, K., Holig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Roder, B. (2022). Typical resting-state activity of the brain requires visual input during an early sensitive period. Brain Communications, 4(4). https://doi.org/10.1093/BRAINCOMMS/FCAC146

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

    1. Author response:

      We thank the editor and the three reviewers for the positive assessment and constructive feedback on how to improve our manuscript. We greatly appreciate that our work is considered valuable to the field, the recognition of the high-resolution model we presented, and the comments on our investigation of CisA’s role in the attachment and firing mechanism of the extended assembly. It is truly gratifying to know that our study contributes to expanding the current understanding of the biology of Streptomyces and the role of these functionally diverse and fascinating bacterial nanomachines.

      We have provided specific responses to each reviewer's comments below. In summary, we intend to address the following requested revisions:

      We will expand our bioinformatic analysis of CisA and provide additional information on the oligomeric state of CisA. We will also modify the text, figures, and figure legends to improve the clarity of our work and experimental procedures.

      Some reviewer comments would require additional experimental work, some of which would involve extensive optimization of experimental conditions. Because both lead postdoctoral researchers involved in this work have now left the lab, we currently do not have the capability to perform additional experimental work.

      Reviewer #1 (Public review):

      Contractile Injection Systems (CIS) are versatile machines that can form pores in membranes or deliver effectors. They can act extra or intracellularly. When intracellular they are positioned to face the exterior of the cell and hence should be anchored to the cell envelope. The authors previously reported the characterization of a CIS in Streptomyces coelicolor, including significant information on the architecture of the apparatus. However, how the tubular structure is attached to the envelope was not investigated. Here they provide a wealth of evidence to demonstrate that a specific gene within the CIS gene cluster, cisA, encodes a membrane protein that anchors the CIS to the envelope. More specifically, they show that:

      - CisA is not required for assembly of the structure but is important for proper contraction and CIS-mediated cell death

      - CisA is associated to the membrane (fluorescence microscopy, cell fractionation) through a transmembrane segment (lacZ-phoA topology fusions in E. coli)

      - Structural prediction of interaction between CisA and a CIS baseplate component<br /> - In addition they provide a high-resolution model structure of the >750-polypeptide Streptomyces CIS in its extended conformation, revealing new details of this fascinating machine, notably in the baseplate and cap complexes.

      All the experiments are well controlled including trans-complemented of all tested phenotypes.

      One important information we miss is the oligomeric state of CisA.

      While it would have been great to test the interaction between CisA and Cis11, to perform cryo-electron microscopy assays of detergent-extracted CIS structures to maintain the interaction with CisA, I believe that the toxicity of CisA upon overexpression or upon expression in E. coli render these studies difficult and will require a significant amount of time and optimization to be performed. It is worth mentioning that this study is of significant novelty in the CIS field because, except for Type VI secretion systems, very few membrane proteins or complexes responsible for CIS attachment have been identified and studied.

      We thank this reviewer for their highly supportive and positive comments on our manuscript. We are grateful for this reviewer’s recognition of the novelty of our study, particularly in the context of membrane proteins and complexes involved in CIS attachment.

      We agree that further experimental evidence on the direct interaction between CisA and Cis11 would have strengthened our model of CisA function. However, as noted by this reviewer, this additional work is technically challenging and currently beyond the scope of this study.

      We thank Reviewer #1 for suggesting discussing the potential oligomeric state of CisA. We will perform additional AlphaFold modelling of CisA and discuss the result of this analysis in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The overall question that is addressed in this study is how the S. coelicolor contractile injection system (CISSc) works and affects both cell viability and differentiation, which it has been implicated to do in previous work from this group and others. The CISSc system has been enigmatic in the sense that it is free-floating in the cytoplasm in an extended form and is seen in contracted conformation (i.e. after having been triggered) mainly in dead and partially lysed cells, suggesting involvement in some kind of regulated cell death. So, how do the structure and function of the CISSc system compare to those of related CIS from other bacteria, does it interact with the cytoplasmic membrane, how does it do that, and is the membrane interaction involved in the suggested role in stress-induced, regulated cell death? The authors address these questions by investigating the role of a membrane protein, CisA, that is encoded by a gene in the CIS gene cluster in S. coelicolor. Further, they analyse the structure of the assembled CISSc, purified from the cytoplasm of S. coelicolor, using single-particle cryo-electron microscopy.

      Strengths:

      The beautiful visualisation of the CIS system both by cryo-electron tomography of intact bacterial cells and by single-particle electron microscopy of purified CIS assemblies are clearly the strengths of the paper, both in terms of methods and results. Further, the paper provides genetic evidence that the membrane protein CisA is required for the contraction of the CISSc assemblies that are seen in partially lysed or ghost cells of the wild type. The conclusion that CisA is a transmembrane protein and the inferred membrane topology are well supported by experimental data. The cryo-EM data suggest that CisA is not a stable part of the extended form of the CISSc assemblies. These findings raise the question of what CisA does.

      We thank Reviewer #2 for the overall positive evaluation of our manuscript and the constructive criticism. 

      Weaknesses:

      The investigations of the role of CisA in function, membrane interaction, and triggering of contraction of CIS assemblies, are important parts of the paper and are highlighted in the title. However, the experimental data provided to answer these questions appear partially incomplete and not as conclusive as one would expect.

      We acknowledge that some aspects of our work have not been fully answered. We believe that providing additional experimental data is currently beyond the scope of this study. To improve this study, we will modify the text and clarify experimental procedures and figures where possible in the revised version of our manuscript.

      The stress-induced loss of viability is only monitored with one method: an in vivo assay where cytoplasmic sfGFP signal is compared to FM5-95 membrane stain. Addition of a sublethal level of nisin lead to loss of sfGFP signal in individual hyphae in the WT, but not in the cisA mutant (similarly to what was previously reported for a CIS-negative mutant). Technically, this experiment and the example images that are shown give rise to some concern. Only individual hyphal fragments are shown that do not look like healthy and growing S. coelicolor hyphae. Under the stated growth conditions, S. coelicolor strains would normally have grown as dense hyphal pellets. It is therefore surprising that only these unbranched hyphal fragments are shown in Fig. 4ab.

      We thank Reviewer #2 for their thoughtful criticism regarding our stress-induced viability assay and the data presented in Figure 4. We acknowledge the importance of ensuring that the presented images should reflect the physiological state of S. coelicolor under the stated growth conditions and recognize that hyphal fragments shown in Figure 4 do not fully capture the typical morphology of S. coelicolor. As pointed out by this reviewer, S. coelicolor grows in large hyphal clumps when cultured in liquid media, making the quantification of fluorescence intensities in hyphae expressing cytoplasmic GFP and stained with the membrane dye FM5-95 particularly challenging. To improve the image analysis and quantification of GFP and FM5-95-fluorescent intensities across the three S. coelicolor strains (wildtype, cisA deletion mutant and the complemented cisA mutant), we vortexed the cell samples briefly before imaging to break up hyphal clumps, increasing hyphal fragments. The hyphae shown in our images were selected as representative examples across three biological replicates. 

      Further, S. coelicolor would likely be in a stationary phase when grown 48 h in the rich medium that is stated, giving rise to concern about the physiological state of the hyphae that were used for the viability assay. It would be valuable to know whether actively growing mycelium is affected in the same way by the nisin treatment, and also whether the cell death effect could be detected by other methods.

      The reasoning behind growing S. coelicolor for 48 h before performing the fluorescence-based viability assay was that we (DOI: 10.1038/s41564-023-01341-x ) and others (e.g.: DOI: 10.1038/s41467-023-37087-7 ) previously showed that the levels of CIS particles peak at the transition from vegetative to reproductive/stationary growth, thus indicating that CIS activity is highest during this growth stage. The obtained results in this manuscript are in agreement with our previous study, in which we showed a similar effect on the viability of wildtype versus cis-deficient S. coelicolor strains (DOI: 10.1038/s41564-023-01341-x ) using nisin, the protonophore CCCP and UV light, and supported by biological replicate experiments and appropriate controls. Furthermore, our results are in agreement with the findings reported in a complementary study by Vladimirov et al. (DOI: 10.1038/s41467-023-37087-7 ) that used a different approach (SYTO9/PI staining of hyphal pellets) to demonstrate that CIS-deficient mutants exhibit decreased hyphal death. We agree that it would be interesting to test if actively growing hyphae respond differently to nisin treatment, and such experiments will be considered in future work. 

      Taken together, we believe that the results obtained from our fluorescence-based viability assay are consistent with data reported by others and provide strong experimental evidence that functional CIS mediate hyphal cell death. 

      The model presented in Fig. 5 suggests that stress leads to a CisA-dependent attachment of CIS assemblies to the cytoplasmic membrane, and then triggering of contraction, leading to cell death. This model makes testable predictions that have not been challenged experimentally. Given that sublethal doses of nisin seem to trigger cell death, there appear to be possibilities to monitor whether activation of the system (via CisA?) indeed leads to at least temporally increased interaction of CIS with the membrane.

      We thank this reviewer for their suggestions on how to test our model further. In the meantime, we have performed co-immunoprecipitation experiments using S. coelicolor cells that produced CisA-FLAG as bait and were treated with a sub-lethal nisin concentration for 0/15/45 min.  Mass spectrometry analysis of co-eluted peptides did not show the presence of CIS-associated peptides. While we cannot exclude the possibility that our experimental assay requires further optimization to successfully demonstrate a CisA-CIS interaction (e.g. optimization of the use of detergents to improve the solubilization of CisA from Streptomyces membrane, which is currently not an established method), an alternative and equally valid hypothesis is that the interaction between CIS particles and CisA is transient and therefore difficult to capture. We would like to mention that we did detect CisA peptides in crude purifications of CIS particles from nisin-stressed cells (Supplementary Table 2, manuscript: line 265/266), supporting our model that CisA associates with CIS particles in vivo.

      Further, would not the model predict that stress leads to an increased number of contracted CIS assemblies in the cytoplasm? No clear difference in length of the isolated assemblies if Fig. S7 is seen between untreated and nisin-exposed cells, and also no difference between assemblies from WT and cisA mutant hyphae.

      The reviewer is correct that there is no clear difference in length in the isolated CIS particles shown in Figure S7. This is in line with our results, which show that CisA is not required for the correct assembly of CIS particles and their ability to contract in the presence and absence of nisin treatment. The purpose of Figure S7 was to support this statement. We would like to note that the particles shown in Figure S7 were purified from cell lysates using a crude sheath preparation protocol, during which CIS particles generally contract irrespective of the presence or absence of CisA. Thus, we cannot comment on whether there is an increased number of contracted CIS assemblies in the cytoplasm of nisin-exposed cells. To answer this point, we would need to acquire additional cryo-electron tomograms (cyroET) of the different strains treated with nisin. We appreciate this reviewer's suggestions. However, cryoET is an extremely time and labour-intensive task, and given that we currently don’t know the exact dynamics of the CIS-CisA interaction following exogenous stress, we believe this experiment is beyond the scope of this work.

      The interaction of CisA with the CIS assembly is critical for the model but is only supported by Alphafold modelling, predicting interaction between cytoplasmic parts of CisA and Cis11 protein in the baseplate wedge. An experimental demonstration of this interaction would have strengthened the conclusions.

      We agree that direct experimental evidence of this interaction would have further strengthened the conclusions of our study, and we have extensively tried to provide additional experimental evidence. Unfortunately, due to the toxicity of CisA expression in E. coli and the transient nature of the interaction under our experimental conditions, we were unable to pursue direct biochemical or biophysical validation methods, such as co-purification or bacterial two-hybrid assays. While these challenges limited our ability to experimentally confirm the interaction, the AlphaFold predictions provided a valuable hypothesis and mechanistic insight into the role of CisA.

      The cisA mutant showed a similarly accelerated sporulation as was previously reported for CIS-negative strains, which supports the conclusion that CisA is required for function of CISSc. But the results do not add any new insights into how CIS/CisA affects the progression of the developmental life cycle and whether this effect has anything to do with the regulated cell death that is caused by CIS. The same applies to the effect on secondary metabolite production, with no further mechanistic insights added, except reporting similar effects of CIS and CisA inactivations.

      We thank this reviewer for their thoughtful feedback and for highlighting the connections between CisA, CIS function, and their effects on the developmental life cycle and secondary metabolite production in S. coelicolor. The main focus of this study was to provide further insight into how CIS contraction and firing are mediated in Streptomyces, and we used the analysis of accelerated sporulation and secondary metabolite production to assess the functionality of CIS in the presence or absence of CisA.

      We agree that we still don’t fully understand the nature of the signals that trigger CIS contraction, but we do know that the production of CIS assemblies seems to be an integral part of the Streptomyces multicellular life cycle as demonstrated in two independent previous studies (DOI: 10.1038/s41564-023-01341-x and DOI: 10.1038/s41467-023-37087-7 ). We propose that the assembly and firing of Streptomyces CIS particles could present a molecular mechanism to sacrifice only a part of the mycelium to either prevent the spread of local cellular damage or to provide additional nutrients for the rest of the mycelium and delay the terminal differentiation into spores and affect the production of secondary metabolites.

      We recognize the importance of understanding the regulation and mechanistic details underpinning the proposed CIS-mediated regulated cell death model. This will be further explored in future studies.

      Concluding remarks:

      The work will be of interest to anyone interested in contractile injection systems, T6SS, or similar machineries, as well for people working on the biology of streptomycetes. There is also a potential impact of the work in the understanding of how such molecular machineries could have been co-opted during evolution to become a mechanism for regulated cell death. However, this latter aspect remains still poorly understood. Even though this paper adds excellent new structural insights and identifies a putative membrane anchor, it remains elusive how the Streptomyces CIS may lead to cell death. It is also unclear what the advantage would be to trigger death of hyphal compartments in response to stress, as well as how such cell death may impact (or accelerate) the developmental progression. Finally, it is inescapable to wonder whether the Streptomyces CIS could have any role in protection against phage infection.

      We thank Reviewer #2 for their supportive assessment of our work. In the revised manuscript, we will briefly discuss the impact of functional CIS assemblies on Streptomyces development. We previously tested if Streptomyces could defend against phages but have not found any experimental evidence to support this idea. The analysis of phage defense mechanisms is an underdeveloped area in Streptomyces research, partly due to the currently limited availability of a diverse phage panel.

      Reviewer #3 (Public review):

      Summary:

      In this work, Casu et al. have reported the characterization of a previously uncharacterized membrane protein CisA encoded in a non-canonical contractile injection system of Streptomyces coelicolor, CISSc, which is a cytosolic CISs significantly distinct from both intracellular membrane-anchored T6SSs and extracellular CISs. The authors have presented the first high-resolution structure of extended CISSc structure. It revealed important structural insights in this conformational state. To further explore how CISSc interacted with cytoplasmic membrane, they further set out to investigate CisA that was previously hypothesized to be the membrane adaptor. However, the structure revealed that it was not associated with CISSc. Using fluorescence microscope and cell fractionation assay, the authors verified that CisA is indeed a membrane-associated protein. They further determined experimentally that CisA had a cytosolic N-terminal domain and a periplasmic C-terminus. The functional analysis of cisA mutant revealed that it is not required for CISSc assembly but is essential for the contraction, as a result, the deletion significantly affects CISSc-mediated cell death upon stress, timely differentiation, as well as secondary metabolite production. Although the work did not resolve the mechanistic detail how CisA interacts with CISSc structure, it provides solid data and a strong foundation for future investigation toward understanding the mechanism of CISSc contraction, and potentially, the relation between the membrane association of CISSc, the sheath contraction and the cell death.

      Strengths:

      The paper is well-structured, and the conclusion of the study is supported by solid data and careful data interpretation was presented. The authors provided strong evidence on (1) the high-resolution structure of extended CISSc determined by cryo-EM, and the subsequent comparison with known eCIS structures, which sheds light on both its similarity and different features from other subtypes of eCISs in detail; (2) the topological features of CisA using fluorescence microscopic analysis, cell fractionation and PhoA-LacZα reporter assays, (3) functions of CisA in CISSc-mediated cell death and secondary metabolite production, likely via the regulation of sheath contraction.

      Weaknesses:

      The data presented are not sufficient to provide mechanistic details of CisA-mediated CISSc contraction, as authors are not able to experimentally demonstrate the direct interaction between CisA with baseplate complex of CISSc (hypothesized to be via Cis11 by structural modeling), since they could not express cisA in E. coli due to its potential toxicity. Therefore, there is a lack of biochemical analysis of direct interaction between CisA and baseplate wedge. In addition, there is no direct evidence showing that CisA is responsible for tethering CISSc to the membrane upon stress, and the spatial and temporal relation between membrane association and contraction remains unclear. Further investigation will be needed to address these questions in future.

      We thank Reviewer #3 for the supportive evaluation and constructive criticism of our study in the public and non-public review. We appreciate your recognition of the technical limitations of experimentally demonstrating a direct interaction between CisA and CIS baseplate complex, and we agree that further investigations in the future will hopefully provide a full mechanistic understanding of the spatiotemporal interaction of CisA and CIS particular and the subsequent CIS firing.

      To further improve the manuscript, we will revise the text and clarify figures and figure legends as suggested in the non-public review.

      Discussion:

      Overall, the work provides a valuable contribution to our understanding on the structure of a much less understood subtype of CISs, which is unique compared to both membrane-anchored T6SSs and host-membrane targeting eCISs. Importantly, the work serves as a good foundation to further investigate how the sheath contraction works here. The work contributes to expanding our understanding of the diverse CIS superfamilies.

      Thank you.

    1. Author response:

      Both reviewers made thoughtful and constructive comments, suggesting improvements that we are keen to provide. The comments fall under 3 headings (1) Further validation of the design, regarding both optical performance and utility, for both education and research (2) Further description and facilitation of the build process and (3) Further description of future plans, in particular plans for dissemination and long-term support. We think these requirements will be best served by adding new content to our Github site and our YouTube channel. We will create this new content and provide a revised manuscript in which these materials are linked from our existing narrative.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors showed the presence of Mtb in human liver biopsy samples of TB patients and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infects macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patients. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. The authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      We thank the reviewer for the positive feedback and for highlighting the strengths of our study.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. The authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drugs is possible only when DME of Mtb inside is up regulated or the target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if the Drug tolerant phenotype section can be rewritten to clarify the facts.

      We agree that several case studies regarding liver infection in pulmonary TB patients have been reported in the literature, however this report is the first comprehensive study that establishes hepatocytes to be a favourable niche for Mtb survival and growth.

      Drug tolerance is a phenomenon that is exhibited by the bacteria and in the course of host-pathogen interactions, can be influenced by both intrinsic (bacterial) and extrinsic (host-mediated) factors. Multiple examples of tolerance being attributed to host driven factors can be found in literature (PMID 32546788, PMID: 28659799, PMID: 32846197). Our studies demonstrate that Mtb infected hepatocytes create a drug tolerant environment by modulating the expression of Drug modifying enzymes (DMEs) in the hepatocytes.

      As suggested by the reviewer we will rewrite the drug tolerant phenotype section.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in the clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in a similar line for M.tb infection. As the liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, the greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      We thank the reviewer for emphasizing on the strengths of our study and how it can lead to further investigations in the field.

      Reviewer #3 (Public review):

      This manuscript by Sarkar et al. examines the infection of the liver and hepatocytes during M. tuberculosis infection. They demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. Transcriptomic analysis of HepG2 cells showed differential regulation of metabolic pathways including fatty acid metabolic processing. Hepatocyte infection is assisted by fatty acid synthesis in the liver and inhibiting this caused reduced Mtb growth. The nuclear receptor PPARg was upregulated by Mtb infection and inhibition or agonism of its activity caused a reduction or increase in Mtb growth, respectively, supporting data published elsewhere about the role of PPARg in lung macrophage Mtb infection. Finally, the authors show that Mtb infection of hepatocytes can cause upregulation of enzymes that metabolize antibiotics, resulting in increased tolerance of these drugs by Mtb in the liver.

      Overall, this is an interesting paper on an area of TB research where we lack understanding. However, some additions to the experiments and figures are needed to improve the rigor of the paper and further support the findings. Most importantly, although the authors show that Mtb can infect hepatocytes in vitro, they fail to describe how bacteria get from the lungs to the liver in an aerosolized infection. They also claim that "PPARg activation resulting in lipid droplets formation by Mtb might be a mechanism of prolonging survival within hepatocytes" but do not show a direct interaction between PPARg activation and lipid droplet formation and lipid metabolism, only that PPARg promotes Mtb growth. Thus, the correlations with PPARg appear to be there but causation, implied in the abstract and discussion, is not proven.

      The human photomicrographs are important and overall, well done (lung and liver from the same individuals is excellent). However, in lines 120-121, the authors comment on the absence of studies on the precise involvement of different cells in the liver. In this study there is no attempt to immunophenotype the nature of the cells harboring Mtb in these samples (esp. hepatocytes). Proving that hepatocytes specifically harbor the bacteria in these human samples would add significant rigor to the conclusions made.

      We thank the reviewer for nicely summarizing our manuscript.

      Our study establishes the involvement of liver and hepatocytes in pulmonary TB infection in mice. Understanding the mechanism of bacterial dissemination from the lung to the liver in aerosol infections demands a detailed separate study.

      Figure 6E and 6F shows how PPARγ agonist and antagonist modulate (increase and decrease respectively) bacterial growth in hepatocytes (further supported by the CFU data in Supplementary Figure 9B). Again, the number of lipid droplets in hepatocytes increase and decrease with the application of PPARγ agonist and antagonist respectively as shown in Figure 6G and 6H. Collectively, these studies provide strong evidence that PPARγ activation leads to more lipid droplets that support better Mtb growth.

      We thank the reviewer for finding our human photomicrographs convincing. In the manuscript, we provide evidence for the direct involvement of the hepatocytes (and liver) in Mtb infection. We perform detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we will further stain the infected hepatocytes with anti-albumin antibody.

  2. Dec 2024
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      Concerns and comments on current version:

      The revision has improved the manuscript but, in my opinion, remains inadequate. While most of my requested changes have been made, I do not see an expansion of Fig1A legend to incorporate more details about the analysis. Lacking details of methodology was a concern from all reviewers.

      To address this concern, we expanded Fig.1A legend, and also significantly expanded the text describing experimental design, to also include the description of the data analysis approach.

      “BCR repertoires libraries were obtained using the 5’-RACE (Rapid Amplification of cDNA Ends) protocol as previously described21 and sequenced with 150+150 bp read length. This approach allowed us to achieve high coverage for the obtained libraries (Table S1) to reveal information on clonal composition, CDR-H3 properties, IgM/IgG/IgA isotypes and somatic hypermutation load within CDR-H3. For B cell clonal lineage reconstruction and phylogenetic analysis, however, 150+150 bp read length is suboptimal because it does not cover V-gene region outside CDR-H3, where hypermutations also occur. Therefore, to verify our conclusions based on the data obtained by 150+150 bp sequencing (“short repertoires”), for some of our samples we also generated BCR libraries by IG RNA Multiplex protocol (See Materials and Methods) and sequenced them at 250+250 bp read length (“long repertoires”). Libraries obtained by this protocol cover V gene sequence starting from CDR-H1 and capture most of the hypermutations in the V gene. Conclusions about clonal lineage phylogeny were drawn only when they were corroborated by “long repertoire” analysis.

      For BCR repertoire reconstruction from sequencing data, we first performed unique molecular identifier (UMI) extraction and error correction (reads/UMI threshold = 3 for 5`RACE and 4 for IG Multiplex libraries). Then, we used MIXCR58 software to assemble reads into clonotypes, determine germline V, D, and J genes, isotypes, and find the boundaries of target regions, such as CDR-H3. Only

      UMI counts, and not read counts, were used for quantitative analysis. Clonotypes derived from only one UMI were excluded from the analysis of individual clonotype features but were used to analyze clonal lineages and hypermutation phylogeny, where sample size was crucial. Samples with 50 or less clonotypes left after preprocessing were excluded from the analysis.”

      Similarly, the 'fragmented' narrative was a concern of all reviewers. These matters have not been dealt with adequately enough - there are parts of the manuscript which remain fragmented and confusing.

      Unfortunately, the reviewers do not give us a hint as to which parts of the text are the most problematic in their opinion. We identified the parts describing physicochemical properties of CDR3s, Intratumoral heterogeneity and Intra-LN heterogeneity as the most problematic, and edited these parts significantly. Also, we significantly edited the Discussion section (please see the Comparison file for details). Other parts sections were also edited to improve readability and clarity.

      The narrative and analysis does not explain how the plasma cell bias has been dealt with adequately and in fact is simply just confusing. There is a paragraph at the beginning of the discussion re the plasma cell bias, which should be re-written to be clearer and moved to have a prominent place early in the results. Why are these results not properly presented? They are key for interpretation of the manuscript. Furthermore, the sorted plasma cell sequencing analysis also has only been performed on two patients.

      In response to this concern, we moved the section describing plasma cell bias in the bulk BCR repertoires to the main text.

      Another issue is that some disease cohorts are entirely composed of patients with metastasis, some without but metastasis is not mentioned. Metastasis has been shown to impact the immune landscape.

      Intrinsic heterogeneity of the cohort is indeed one of the weaknesses of our work, which could negatively impact the statistical significance of our results and, as a consequence, mask certain observations or make them less statistically significant. We mention this in the discussion section. It should not, in our understanding, lead to any false conclusions. We did not, however, pool data from primary and metastatic tumor samples, and all tumor samples that we mention are primary tumors.

      The following part of a sentence was added to the discussion:

      “...which could negatively impact the statistical significance of our results and, as a consequence, mask certain observations or make them less statistically significant.”

      A reviewer brought up a concern about the overlap analysis and I also asked for an explanation on why this F2 metric was chosen. Part of the rebuttal argues that another metric was explored showing similar results, thus the conclusion reached is reasonable. Remarkably, these data are not only omitted from the manuscript, but are not even provided for the reviewers.

      We did not intend to conceal any data from the reviewers, and we now added the panel for D metric to the S1 figure. We would also like to point out that the panel describing R metric for repertoire overlaps (a measure of similarity of overlapping clonotype frequencies), was included in the first version of the S2 Figure (now S1 Figure), and it also showed a similar trend. We hope that now the data are fully conclusive.

      This manuscript certainly includes some interesting and useful work. Unfortunately, a comprehensive re-write was required to make the work much clearer and easier to understand and this has not been realized.

      Again, we thank the reviewers for their thorough evaluation, and hopefully we could make the text clearer in the second reviewed version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, a screening platform is presented for rapid and cost-effective screening of candidate genes involved in Fragile Bone Disorders. The authors validate the approach of using crispants, generating FO mosaic mutants, to evaluate the function of specific target genes in this particular condition. The design of the guide RNAs is convincingly described, while the effectiveness of the method is evaluated to 60% to 92% of the respective target genes being presumably inactivated. Thus, injected F0 larvae can be directly used to investigate the consequences of this inactivation.

      Skeletal formation is then evaluated at 7dpf and 14dpf, first using a transgenic reporter line revealing fluorescent osteoblasts, and second using alizarin-red staining of mineralized structures. In general, it appears that the osteoblast-positive areas are more often affected in the crispants compared to the mineralized areas, an observation that appears to correlate with the observed reduced expression of bglap, a marker for mature osteoblasts, and the increased expression of col1a1a in more immature osteoblasts.

      Finally, the injected fish (except two lines that revealed high mortality) are also analyzed at 90dpf, using alizarin red staining and micro-CT analysis, revealing an increased incidence of skeletal deformities in the vertebral arches, fractures, as well as vertebral fusions and compressions for all crispants except those for daam2. Finally, the Tissue Mineral Density (TMD) as determined by micro-CT is proposed as an important marker for investigating genes involved in osteoporosis.

      Taken together, this manuscript is well presented, the data are clear and well analyzed, and the methods are well described. It makes a compelling case for using the crispant technology to screen the function of candidate genes in a specific condition, as shown here for bone disorders.

      Strengths:

      Strengths are the clever combination of existing technologies from different fields to build a screening platform. All the required methods are comprehe Zebrafish tanks_13062024nsively described.

      We would like to thank the reviewer for highlighting the strengths of our paper.  

      Weaknesses:

      One may have wished to bring one or two of the crispants to the stage of bona fide mutants, to confirm the results of the screening, however, this is done for some of the tested genes as laid out in the discussion.

      We thank the reviewer for their comment. We would like to point out that indeed similar phenotypes have been observed in existing models, as mentioned in the discussion section.

      Reviewer #2 (Public review):

      Summary:

      More and more genes and genetic loci are being linked to bone fragility disorders like osteoporosis and osteogenesis imperfecta through GWAS and clinical sequencing. In this study, the authors seek to develop a pipeline for validating these new candidate genes using crispant screening in zebrafish. Candidates were selected based on GWAS bone density evidence (4 genes) or linkage to OI cases plus some aspect of bone biology (6 genes). NGS was performed on embryos injected with different gRNAs/Cas9 to confirm high mutagenic efficacy and off-target cutting was verified to be low. Bone growth, mineralization, density, and gene expression levels were carefully measured and compared across crispants using a battery of assays at three different stages.

      Strengths:

      (1) The pipeline would be straightforward to replicate in other labs, and the study could thus make a real contribution towards resolving the major bottleneck of candidate gene validation.

      (2) The study is clearly written and extensively quantified.

      (3) The discussion attempts to place the phenotypes of different crispant lines into the context of what is already known about each gene's function.

      (4) There is added value in seeing the results for the different crispant lines side by side for each assay.

      We would like to thank the reviewer for highlighting the strengths of our paper.  

      Weaknesses:

      (1) The study uses only well-established methods and is strategy-driven rather than question/hypothesis-driven.

      We thank the reviewer for this correct remark. The mayor aim of this study was to establish a workflow for rapid in vivo functional screening of candidate genes across a broad range of FBDs. 

      (2) Some of the measurements are inadequately normalized and not as specific to bone as suggested:

      (a) The measurements of surface area covered by osteoblasts or mineralized bone (Figure 1) should be normalized to body size. The authors note that such measures provide "insight into the formation of new skeletal tissue during early development" and reflect "the quantity of osteoblasts within a given structure and [is] a measure of the formation of bone matrix." I agree in principle, but these measures are also secondarily impacted by the overall growth and health of the larva. The surface area data are normalized to the control but not to the size/length of each fish - the esr1 line in particular appears quite developmentally advanced in some of the images shown, which could easily explain the larger bone areas. The fact that the images in Figure S5 were not all taken at the same magnification further complicates this interpretation.

      We thank the reviewer for this detailed and insightful remark. We agree with the reviewer and recognize that the results may be influenced by size differences. However, we do not normalize for size, as variations in growth were considered as part of the phenotypic outcome. This consideration has been addressed in the discussion section.

      Line 335-338: ‘Although the measurements of osteoblast-positive and mineralized surface areas may be influenced by size differences among some of the crispants, normalization to size parameters was not conducted, as variations in growth were considered integral to the phenotypic outcome.’

      Line 369: ‘Phenotypic variability in these zebrafish larvae can be attributed to several factors, including crispant mosaicism, allele heterogeneity, environmental factors, differences in genomic background and development, and slightly variable imaging positioning.’

      (b) Some of the genes evaluated by RT-PCR in Figure 2 are expressed in other tissues in addition to bone (as are the candidate genes themselves); because whole-body samples were used for these assays, there is a nonzero possibility that observed changes may be rooted in other, non-skeletal cell types.

      We thank the reviewer for this valuable comment. We acknowledge that the genes assessed by RT-PCR are expressed in other tissues beyond bone. This consideration has been addressed in the discussion section.

      Line 362-365: “However, it is important to note that the genes evaluated by RT-PCR are not exclusively expressed in bone tissue. Since whole-body samples were used for expression analysis, there is a possibility that the observed changes in gene expression may be influenced by other non-skeletal cell types”.

      (3) Though the assays evaluate bone development and quality at several levels, it is still difficult to synthesize all the results for a given gene into a coherent model of its requirement.

      We appreciate the reviewer’s  remark. We acknowledge that the results for the larval stages exhibit variability, making it challenging to synthesize them into a coherent model. However, it is important to emphasize that all adult crispant consistently display a skeletal phenotype. Consequently, the feasibility and reproducibility of this screening method are primarily focusing on the adult stages. This consideration has been addressed in the discussion section of the manuscript.

      Line 391-399: ‘In adult crispants, the skeletal phenotype was generally more penetrant. All crispants showed malformed arches, a majority displayed vertebral fractures and fusions and some crispants exhibited distinct quantitative variations in vertebral body measurements. This confirmed the role of the selected genes in skeletal development and homeostasis and their involvement in skeletal disease and established the crispant approach as a valid approach for rapidly providing in vivo gene function data to support candidate gene identification.’

      (4) Several additional caveats to crispant analyses are worth noting:

      (a) False negatives, i.e. individual fish may not carry many (or any!) mutant alleles. The crispant individuals used for most assays here were not directly genotyped, and no control appears to have been used to confirm successful injection. The authors therefore cannot rule out that some individuals were not, in fact, mutagenized at the loci of interest, potentially due to human error. While this doesn't invalidate the results, it is worth acknowledging the limitation.

      We thank the reviewer for this valuable remark. We recognize the fact that working with crispants has certain limitations, including the possibility that some individuals may carry few or no mutant alleles. To address this issue, we use 10 individual crispants during the larval stage and 5 during the adult stage. Although some individuals may lack the mutant alleles, using multiple fish helps reduce the risk of false negatives.

      Furthermore, we perform NGS analysis on pools of 10 embryos from the same injection clutch as the fish used in the various assays to assess the indel efficiency. While there remains a possibility of false negatives, the overall indel efficiency, as indicated by our NGS analysis,  is high (>90%), thereby reducing the likelihood of having crispants with very low indel efficiency. We included this in the discussion.

      Line 387-390: ‘While there remains a possibility of false negatives, the overall indel efficiency, as indicated by our NGS analysis,  is high (>90%), thereby reducing the likelihood of having crispants with very low indel efficiency.’

      (b) Many/most loci identified through GWAS are non-coding and not easily associated with a nearby gene. The authors should discuss whether their coding gene-focused pipeline could be applied in such cases and how that might work.

      The authors thank the reviewer for this insightful comment. Our study is focused on strong candidate genes rather than non-coding variants. We recognize that the use of this workflow poses challenges for analyzing non-coding variants, which represents a limitation of the crispant approach. We have addressed this issue in the discussion section of the manuscript.

      Line 131: ‘Gene-based’

      Line 453: ‘Gene-based’

      Line 311-314: ‘It is important to note that this study focused on candidate genes for osteoporosis, not on the role of specific variants identified in GWAS studies. Non-coding variants for instance, which are often identified in GWAS studies,  present significant challenges in terms of functional validation and interpretation.’

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Crispant analysis in zebrafish as a tool for rapid functional screening of disease-causing genes for bone fragility" describes the use of CRISPR gene editing coupled with phenotyping mosaic zebrafish larvae to characterize functions of genes implicated in heritable fragile bone disorders (FBDs). The authors targeted six high-confident candidate genes implicated in severe recessive forms of FBDs and four Osteoporosis GWAS-implicated genes and observed varied developmental phenotypes across all crispants, in addition to adult skeletal phenotypes.

      A major strength of the paper is the streamlined method that produced significant phenotypes for all candidate genes tested.

      We would like to thank the reviewer for highlighting the strengths of our paper.  

      A major weakness is a lack of new insights into underlying mechanisms that may contribute to disease phenotypes, nor any clear commonalities across gene sets. This was most evident in the qRT-PCR analysis of select skeletal developmental genes, which all showed varied changes in fold and direction, but with little insight into the implications of the results.

      We thank the reviewer for this insightful remark. We want to emphasize that this study focusses on establishing a new screening method for candidate genes involved in FBDs, rather than investigating the underlying mechanisms contributing to disease phenotypes. However, to investigate the underlying mechanisms in these crispants, the creation of bona fide mutants is necessary. We have included this consideration in the discussion.

      Furthermore, we acknowledge that the results for the larval stages exhibit variability, which can complicate the interpretation of these findings. This is particularly true for the RT-PCR analysis, where whole-body samples were used, raising the possibility that other tissues may influence the expression results. Therefore, our primary focus is on the adult stages, as all crispants display a skeletal phenotype at this age. We have elaborated on this point in the discussion.

      Line 462-463: ‘Moreover, to explore the underlying mechanisms contributing to disease phenotypes, it is essential to establish stable knockout mutants derived from the crispants’.

      Line 391-399: ‘In adult crispants, the skeletal phenotype was generally more penetrant. All crispants showed malformed arches, a majority displayed vertebral fractures and fusions and some crispants exhibited distinct quantitative variations in vertebral body measurements. This confirmed the role of the selected genes in skeletal development and homeostasis and their involvement in skeletal disease and established the crispant approach as a valid approach for rapidly providing in vivo gene function data to support candidate gene identification.’

      Ultimately, the authors were able to show their approach is capable of connecting candidate genes with perturbation of skeletal phenotypes. It was surprising that all four GWAS candidate genes (which presumably were lower confidence) also produced a result.

      We appreciate the reviewer’s comment. We would like to direct attention to the discussion section, where we offer a possible explanation for the observation that all four GWAS candidate genes produce a skeletal phenotype.

      Line 460-410: 'The more pronounced and earlier phenotypes in these zebrafish crispants are most likely attributed to the quasi knock-out state of the studied genes, while more common less impactful variants in the same genes result in typical late-onset osteoporosis (Laine et al., 2013) . This phenomenon is also observed in knock-out mouse models for these genes (Melville et al., 2014)(Coughlin et al., 2019).’

      These authors have previously demonstrated that crispants recapitulate skeletal phenotypes of stable mutant lines for a single gene, somewhat reducing the novelty of the study.

      We thank the reviewer for this comment and appreciate their concern. We have indeed demonstrated that crispants can recapitulate the skeletal phenotypes observed in stable mutant lines for the osteoporosis gene LRP5. However, we would like to highlight that the current study represents the first large-scale screening of candidate genes associated with bone disorders, including genes related to both OI and osteoporosis. We have included this information in both the abstract and the discussion

      Line 60-62: ‘We advocate for a novel comprehensive approach that integrates various techniques and evaluates distinct skeletal and molecular profiles across different developmental and adult stages.’

      Line 456-457: ‘While this work represents a pioneering effort in establishing a screening platform for skeletal diseases, it offers opportunities for future improvement and refinement.’

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1a: what does the differential shading of the bone elements represent? Explain in the legend.

      The differential shading doesn't represent anything specific. It's simply used to enhance the visual appeal and to help distinguish between the different structures. We removed the shading in the figure.

      (2) Supplementary Figures 2-5: should the numbering of these figures be also in order of appearance in the text? I understand that the authors prefer to associate the transgenic and the alizarin red-stained specimens, however, the reading would be easier that way.

      We changed this accordingly.

      (3) Lines 275-276: "no significant differences in standard length (Figure 4a)": should be Figure 4b.

      The suggested changes are incorporated in the manuscript.

      Line 276-277: ‘Among the eight crispants that successfully matured into adulthood, none exhibited significant differences in standard length and head size (n=5 fish per crispant) (Figure 4b).’

      (4) Line 277 "larger eye diameter": should be Figure 4b.

      The suggested changes are incorporated in the manuscript.

      Line 378: ‘However, esr1 crispants were observed to have notably larger eye diameters (Figure 4b).’

      (5) Line 280: "no obvious abnormalities were detected (Figure 4b,c)": should be Figure 4a, c. Note that the authors may reconsider the a, b, c numbering in Figure 4 by inverting a and b.

      The suggested changes are incorporated in the manuscript.

      Line 278-281: ‘All these crispants demonstrated various abnormalities in the caudal part of the vertebral column such as fusions, compressions, fractures, or arch malformations, except for daam2 crispants where no obvious abnormalities were detected (Figure 4a,c; Supplementary Figure 6).’

      (6) Table 2: This table, which recapitulates all the results presented in the manuscript, is in the end the centerpiece of the work. It is however difficult to read in its present form. Three suggestions:

      - Transpose it such that each gene has its own column, and the lines give the results for the different measurements

      - Place the measurements that result in "ns" for all crispants at the end (bottom) of the table.

      - Maybe bring the measurements at 7dpf, 14dpf, and 90 dpf together.

      We agree with the reviewer and have added a new table where we transposed the data. However, we chose not to place the measurements that resulted in 'ns' for all crispants at the end of the table, as we believe it is important to track the evolution of the phenotype over time. Where possible, we have grouped the measurements for 7 dpf and 14 dpf together.

      Reviewer #2 (Recommendations for the authors):

      (1) It would help to justify why these particular area measurements are appropriate for this set of candidate genes, which were selected based on putative links to bone quality rather than bone development.

      The selected methods are among the most commonly used to evaluate bone phenotypes. They are straightforward to reproduce, as well as cost- and time-effective. The strength of this approach lies in its use of simple, reproducible techniques that form the foundation for characterizing bone development.  Although the candidate genes were chosen based on their putative links to bone quality, early skeletal phenotypes can already be observed during bone development.

      The mineralized surface area of the total head and specific head structures was selected to evaluate the degree of mineralization in early skeletal development, as mineralization is a direct indicator of bone formation. Additionally, the osteoblast-positive surface areas were measured to provide insight into the formation of new skeletal tissue during early development. Osteoblasts, as active bone-forming cells, are essential for understanding bone growth and the dynamics of skeletal phenotypes.

      Examples in the manuscript:

      Line 212-214: ‘The osteoblast-positive areas in both the total head and the opercle were then quantified to gain insight into the formation of new skeletal tissue during early development.’

      Line 221-223: ‘Subsequently, Alizarin Red S (ARS) staining was conducted on the same 7 and 14 dpf crispant zebrafish larvae in order to evaluate the degree of mineralization in the early skeletal structures.’

      (2) Reword: The opercle bone is the earliest forming bone of the opercular series, and appears to be what the authors are referring to as the "operculum" at 7-14 dpf. The operculum is the larger structure (gill cover) in which the opercle is embedded. It would be more accurate to simply refer to the opercle at these stages.

      We agree with this comment and changed the text accordingly.

      (3) Define BMD and TMD at first usage.

      BMD and TMD are now defined in the manuscript.

      Line 41-43: ‘Six genes associated with severe recessive forms of Osteogenesis Imperfecta (OI) and four genes associated with bone mineral density (BMD), a key osteoporosis indicator, identified through genome-wide association studies (GWAS) were selected.’

      Line 286-288: ‘For each of the vertebral centra, the length, tissue mineral density (TMD), volume, and thickness were determined and tested for statistical differences between groups using a regression-based statistical test (Supplementary Figure 7).’

      (4) It would be helpful to note the grouping of candidates into OI vs. BMD GWAS throughout the figures.

      We agree with this comment and added this to all figure legends.

      ‘The first four genes are associated with the pathogenesis of osteoporosis, while the last six are linked to osteogenesis imperfecta’

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) For the Results, it would be useful to the Reader to justify the selection of human candidate genes and their associated zebrafish orthologs to model skeletal functions. For example, what are variants identified from human studies, and do they impact functional domains? Are these domains and/or proteins conserved between humans/zebrafish? Is there evidence of skeletal expression in humans/zebrafish?

      Supplementary Table 4 lists the selected human candidate genes with reported mutations and/or polymorphisms associated with both skeletal and non-skeletal phenotypes. The table also includes additional findings from studies in mice and zebrafish. An extra column was now added to indicate gene conservation between human and zebrafish. We consulted UniProt (https://www.uniprot.org) and ZFIN (https://zfin.org) to assess the skeletal expression of these genes in human and zebrafish. All genes showed expression in the trabecular bone and/or bone marrow in humans, as well as in bone elements in zebrafish. We added this in the discussion.

      Line 309: ‘All selected genes show skeletal expression in both human and zebrafish.’

      Supplemental table 4 legend: ‘The conservation between human and zebrafish is reported in the last column.’

      As part of this, some version of Supplementary Table 4 might be included as a main display to introduce the targeted genes, ideally separated by rare (recessive OI) vs. common disease (osteoporosis). In the case of common disease and GWAS hits, how did authors narrow in on candidate genes (which often have Mbp-scale associated regions spanning multiple genes)? Further, what is the evidence that the mechanism of action of the GWAS variant is haploinsufficiency modeled by their crispant zebrafish?

      We have kept Supplementary Table 4 in the supplementary material but have referred to it earlier in the manuscript’s introduction. Consequently, the table has been renumbered from ‘Supplementary Table 4’  to ‘Supplementary Table 1’.

      The selection of genes potentially involved in the pathogenesis of osteoporosis is based on the data from the GWAS catalog, which annotates SNPs using the Ensemble mapping pipeline. The available annotation on their online search interface includes any Ensemble genes to which a SNP maps, or the closest upstream and downstream gene within a 50kb window. Four genes were selected for this screening method based on the criteria outlined in the results section. In this study, we aim to evaluate the general involvement of specific genes in bone metabolism, rather than to model a specific variant.

      Line 135-136 and 309-311: ‘An overview of the selected genes with observed mutant phenotypes in human, mice and zebrafish is provided in Supplementary Table 1.’

      (2) Using the crispant approach does not impact maternally-deposited RNAs that would dampen early developmental phenotypes. Considering the higher variability in larval phenotypes, perhaps the maternal effect plays a role. The authors might investigate developmental expression profiles of their genes using existing RNA-seq datasets such as from White et al (doi: 10.7554/eLife.30860).

      We thank the reviewer for this comment and agree with the possibility that maternally-deposited RNAs might have an impact on early developmental phenotypes. We included this in the discussion.

      Line 369-372: ‘Phenotypic variability in these zebrafish larvae can be attributed to several factors, including crispant mosaicism, allele heterogeneity, environmental factors, differences in genomic background and development, maternally-deposited RNAs, and slightly variable imaging positioning.’

      (3) While making comparisons within a clutch of mutant vs scrambled control is crucial, it is also important to ensure phenotypes are not specific to a single clutch. Do phenotypes remain consistent across different crosses/clutches?

      Yes, phenotypes remain consistent across different crosses and clutches. We included images from a second clutch in the Supplementary material (Supplementary Figure 8) and refereed to it in the discussion.

      Line 394-397: ‘Additionally, these skeletal malformations were consistently observed in a second clutch of crispants (Supplementary Figure 8), underscoring the reproducibility of these phenotypic features across independent clutches.’

      (4) Understanding that antibodies may not exist for many of the selected genes for zebrafish, authors should verify haploinsufficiency using an RT-qPCR of targeted genes in crispants vs. controls.

      We appreciate the reviewer’s suggestion to use RT-qPCR to examine expression levels of the targeted genes in crispants. However, previous experience suggests that relying on RNA expression to verify haploinsufficiency in zebrafish can be challenging. In zebrafish KO mutants, RT-qPCR often still detects gene transcripts, potentially due to incomplete nonsense-mediated decay (NMD) of the mutated mRNA, which may allow residual expression even in the absence of functional protein. As a more definitive approach, we prefer to use antibodies to confirm haploinsufficiency at the protein level. However, as the reviewer noted, generating and applying specific antibodies in zebrafish remains challenging.

      (5) Please indicate how parametric vs. non-parametric statistical tests were selected for datasets.

      We initially selected the parametric unpaired t-test, assuming the data were normally distributed with similar variances between groups. We verified the assumption of equal variances using the F-test, which was not significant across all assays. However, we did not assess the normality of the data directly, meaning we cannot confirm the normality assumption required for the t-test. Given this, we have opted to use the non-parametric Mann-Whitney U test, which does not require assumptions of normality, to ensure the robustness of our statistical analyses. We changed the Figures, the figure legends and the text accordingly.

      (6) In the figures and tables, I recommend adding notation showing the grouping of the first four genes as GWAS osteoporosis, the next three genes as osteoblast differentiation, the next two genes as bone mineralization, and the final gene as collagen transport to orient the reader. One might expect there to be a clustering of phenotypic outcomes based on the selection of genes, and it would be easier to follow this. This would be particularly useful to include in Table 2.

      Our primary objective is to assess the feasibility and reproducibility of the crispant screen rather than performing an in-depth pathway analysis or categorizing genes by biological processes. For this purpose, we have organized candidate genes based on their relevance to osteoporosis and Osteogenesis Imperfecta, without subdividing them further. We have clarified this focus in the figure legends, as suggested in an earlier recommendation.

      (7) For Figure 1, consider adding a smaller zoomed version of 1a embedded in each sub-figure with each measured element highlighted to improve readability.

      We agree with this comment and changed the figure accordingly.

      Minor points:

      (1) Table 2 could be simplified to improve readability. The headers have redundancies across columns with varied time points and could be merged.

      The suggested changes are incorporated in the manuscript (see earlier comment about this).

      (2) "BMD" is not defined in the Abstract. This is a personal preference, but there were numerous abbreviations in the text that made it difficult to follow at times.

      The suggested changes are incorporated in the manuscript (see earlier comment about this).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reveals how a rhizobial effector protein cleaves and inhibits a key plant receptor for symbiosis signaling, while the host plant counters by phosphorylating the effector. The molecular evidence for the protein-protein interaction and modification is solid, though biological evidence directly linking effector cleavage to rhizobial infection is incomplete. With additional functional data, this work could have implications for understanding intricate plant-microbe dynamics during mutualistic interactions.

      Thank you for this positive comment. Our data strongly support the view that NFR5 cleavage by NopT impairs Nod factor signaling resulting in reduced rhizobial infection. However, other mechanisms may also have an effect on the symbiosis, as NopT targets other proteins in addition to NFR5. In our revised manuscript version, we discuss the possibility that negative NopT effects on symbiosis could be due to NopT-triggered immune responses. As mentioned in our point-by-point answers to the Reviewers, we included additional data into our manuscript. We would also like to point out that we are generally more cautious in our revised version in order to avoid over-interpreting the data obtained.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bacterial effectors that interfere with the inner molecular workings of eukaryotic host cells are of great biological significance across disciplines. On the one hand they help us to understand the molecular strategies that bacteria use to manipulate host cells. On the other hand they can be used as research tools to reveal molecular details of the intricate workings of the host machinery that is relevant for the interaction/defence/symbiosis with bacteria. The authors investigate the function and biological impact of a rhizobial effector that interacts with and modifies, and curiously is modified by, legume receptors essential for symbiosis. The molecular analysis revealed a bacterial effector that cleaves a plant symbiosis signaling receptor to inhibit signaling and the host counterplay by phosphorylation via a receptor kinase. These findings have potential implications beyond bacterial interactions with plants.

      Thank you for highlighting the broad significance of rhizobial effectors in understanding legume-rhizobia interactions. We fully agree with your assessment and have expanded our Discussion (and Abstract) regarding the potential implications of our findings beyond bacterial interactions with plants. We mention the prospect of developing specific kinase-interacting proteases to fine-tune cellular signaling processes in general.

      Bao and colleagues investigated how rhizobial effector proteins can regulate the legume root nodule symbiosis. A rhizobial effector is described to directly modify symbiosis-related signaling proteins, altering the outcome of the symbiosis. Overall, the paper presents findings that will have a wide appeal beyond its primary field.

      Out of 15 identified effectors from Sinorhizobium fredii, they focus on the effector NopT, which exhibits proteolytic activity and may therefore cleave specific target proteins of the host plant. They focus on two Nod factor receptors of the legume Lotus japonicus, NFR1 and NFR5, both of which were previously found to be essential for the perception of rhizobial nod factor, and the induction of symbiotic responses such as bacterial infection thread formation in root hairs and root nodule development (Madsen et al., 2003, Nature; Tirichine et al., 2003; Nature). The authors present evidence for an interaction of NopT with NFR1 and NFR5. The paper aims to characterize the biochemical and functional consequences of these interactions and the phenotype that arises when the effector is mutated.

      Thank you for your positive feedback.  We have now emphasized the interdisciplinary significance of our work in the Introduction and Discussion of our revised manuscript. We highlight how the insights gained from our study can contribute to a better understanding of microbial interactions with eukaryotic hosts in general, and hope that our findings could benefit future research in the fields of pathogenesis, immunity, and symbiosis.

      We appreciate your detailed summary of our work, which is focused on NopT and its interaction with Nod factor receptors. To ensure that the readers can easily follow the rationale behind our work, we have included a more detailed explanation of how NopT was identified to target Nod factor receptors. In particular, we now better describe the test system (Nicotiana benthamiana cells co-expressing NFR1/NFR5 with a given effector of Sinorhizobium fredii NGR234). In addition, we provide now a more thorough background on the roles of NFR1 and NFR5 in symbiotic signaling and refer to the two Nature papers from 2003 on NFR1 and NFR5 (Madsen et al., 2003; Radutoiu et al., 2003).

      Evidence is presented that in vitro NopT can cleave NFR5 at its juxtamembrane region. NFR5 appears also to be cleaved in vivo. and NFR1 appears to inhibit the proteolytic activity of NopT by phosphorylating NopT. When NFR5 and NFR1 are ectopically over-expressed in leaves of the non-legume Nicotiana benthamiana, they induce cell death (Madsen et al., 2011, Plant Journal). Bao et al., found that this cell death response is inhibited by the coexpression of nopT. Mutation of nopT alters the outcome of rhizobial infection in L. japonicus. These conclusions are well supported by the data.

      We appreciate your recognition of the robustness of our conclusions. In the context of your comments, we made the following improvements to our manuscript:

      We included a more detailed description of the experimental conditions under which the cleavage of NFR5 by NopT was observed in vitro and in vivo. Furthermore, additional experiments were added to strengthen the evidence for NFR5 cleavage by NopT (Fig 3, S3, S6, and S14).

      We provided more comprehensive data on the phosphorylation of NopT by NFR1, including phosphorylation assays (Fig. 4) and mass spectrometry results (Fig. S7 and Table S1). These data provide additional information on the mechanism by which NFR1 inhibits the proteolytic activity of NopT.

      We expanded the discussion on the cell death response induced by ectopic expression of NFR1 and NFR5 in Nicotiana benthamiana. We also included further details from Madsen et al. (2011) to contextualize our findings within the known literature.

      We believe that these additions and clarifications have improved the significance and impact of our study.

      The authors present evidence supporting the interaction of NopT with NFR1 and NFR5. In particular, there is solid support for cleavage of NFR5 by NopT (Figure 3) and the identification of NopT phosphorylation sites that inhibit its proteolytic activity (Figure 4C). Cleavage of NFR5 upon expression in N. benthamiana (Figure 3A) requires appropriate controls (inactive mutant versions) that have been provided, since Agrobacterium as a closely rhizobia-related bacterium, might increase defense related proteolytic activity in the plant host cells.

      We appreciate your recognition of the importance of appropriate controls in our experimental design. In response to your comments, we revised our manuscript to ensure that the figures and legends provide a clear description of the controls used. We also included a more detailed description of our experimental design at several places. In particular, we have highlighted the use of the protease-dead version of NopT as a control (NopT<sup>C93S</sup>). Therefore, NFR5-GFP cleavage in N. benthamiana clearly depended on protease activity of NopT and not on Agrobacterium (Fig. 3A). In the revised text, we are now more cautious in our wording and don’t conclude at this stage that NopT proteolyzes NFR5. However, our subsequent experiments, including in vitro experiments, clearly show that NopT is able to proteolyze NFR5.

      We are convinced that these changes have improved the quality of our work.

      Key results from N. benthamiana appear consistent with data from recombinant protein expression in bacteria. For the analysis in the host legume L. japonicus transgenic hairy roots were included. To demonstrate that the cleavage of NFR5 occurs during the interaction in plant cells the authors build largely on western blots. Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with nopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Thank you for your comments regarding the cleavage of NFR5 by NopT and its functional implications. We acknowledge that our immunoblots indicate only a relatively small proportion of  the NFR5 cleavage product.  Possible explanations could be as follows:

      (1) The presence of full-length NFR5 does not preclude a significant impact of NopT on function of NFR5, as NopT is able to bind to NFR5. In other words, the NopT-NFR5 and NopT-NFR1 interactions at the plasmamembrane might influence the function of the NFR1/NFR5 receptor without proteolytic cleavage of NFR5. In fact, protease-dead NopT<sup>C93S</sup> expressed in NGR234Δ_nopT_ showed certain effects in L. japonicus (less infection foci were formed compared to NGR234Δ_nopT_ Fig. 5E).  In this context, it is worth mentioning that the non-acylated NopT<sup>C93S</sup> (Fig. 1B) and not<sub>USDA257</sub> (Fig. 6B) proteins were unable to suppress NFR1/NFR5-induced cell death in N. benthamina, but this could be explained by the lack of acylation and altered subcellular localization.

      (2) The cleaved NFR5 fraction, although small, may be sufficient to disrupt signaling pathways, leading to the observed phenotypic changes  (loss of cell death in N. benthamiana; altered infection in L. japonicus).

      (3) The used expression systems produce high levels of proteins in the cell. This may not reflect the natural situation in L. japonicus cells.

      (4) Cellular conditions could impair cleavage of NFR5 by NopT.  Expression of proteins in E. coli may partially result in formation of protein aggregates (inactive NopT; NFR5 resistant to proteolysis).

      (5) In N. benthamiana co-expressing NFR1/NFR5, the NFR1 kinase activity is constitutively active (i.e., does not require Nod factors), suggesting an altered protein conformation of the receptor complex, which may influence the proteolytic susceptibility of NFR5.

      (6) The proteolytic activity of NopT may be reduced by the interaction of NopT with other proteins such as NFR1, which phosphorylates NopT and inactivates its protease activity.

      In our revised manuscript version, we provide now quantitative data for the efficiency of NFR5 cleavage by NopT in different expression systems used (Supplemental Fig.  14).  We have also improved our Discussion in this context. Future research will be necessary to better understand loss of NFR5 function by NopT. 

      It is also difficult to evaluate how the ratios of cleaved and full-length protein change when different versions of NopT are present without a quantification of band strengths normalized to loading controls (Figure 3C, 3D, 3F). The same is true for the blots supporting NFR1 phosphorylation of NopT (Figure 4A).

      Thank you for pointing out this. Following your suggestions, we quantified the band intensities for cleaved and full-length NFR5 in our different expression systems (N. benthamiana, L. japonicus and E. coli). The protein bands were normalized to loading controls. The data are shown in the new Supplemental Fig. 14. Similarly, the bands of immunoblots supporting phosphorylation of NopT by NFR1 were quantified. The data on band intensities are shown in Fig.  4B of our revised manuscript. These improvements provide a clearer understanding of how the ratios of cleaved to full-length proteins change in different protein expression systems, and to which extent NopT was phosphorylated by NFR1.

      Nodule primordia and infection threads are still formed when L. japonicus plants are inoculated with ∆nopT mutant bacteria, but it is not clear if these primordia are infected or develop into fully functional nodules (Figure 5). A quantification of the ratio of infected and non-infected nodules and primordia would reveal whether NopT is only active at the transition from infection focus to thread or perhaps also later in the bacterial infection process of the developing root nodule.

      Thank you for highlighting this aspect of our study. In response to your comment, we have conducted additional inoculation experiments with L. japonicus plants inoculated with NGR234 and NGR234_ΔnopT_ mutant. The new data are shown in Fig 5A, 5E, and 5G. However, we could not find any uninfected nodules (empty) nodules when roots were inoculated with these strains and mention this observation in the Results section of our revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents data demonstrating NopT's interaction with Nod Factor Receptors NFR1 and NFR5 and its impact on cell death inhibition and rhizobial infection. The identification of a truncated NopT variant in certain Sinorhizobium species adds an interesting dimension to the study. These data try to bridge the gaps between classical Nod-factor-dependent nodulation and T3SS NopT effector-dependent nodulation in legume-rhizobium symbiosis. Overall, the research provides interesting insights into the molecular mechanisms underlying symbiotic interactions between rhizobia and legumes.

      Strengths:

      The manuscript nicely demonstrates NopT's proteolytic cleavage of NFR5, regulated by NFR1 phosphorylation, promoting rhizobial infection in L. japonicus. Intriguingly, authors also identify a truncated NopT variant in certain Sinorhizobium species, maintaining NFR5 cleavage but lacking NFR1 interaction. These findings bridge the T3SS effector with the classical Nod-factor-dependent nodulation pathway, offering novel insights into symbiotic interactions.

      Weaknesses:

      (1) In the previous study, when transiently expressed NopT alone in Nicotiana tobacco plants, proteolytically active NopT elicited a rapid hypersensitive reaction. However, this phenotype was not observed when expressing the same NopT in Nicotiana benthamiana (Figure 1A). Conversely, cell death and a hypersensitive reaction were observed in Figure S8. This raises questions about the suitability of the exogenous expression system for studying NopT proteolysis specificity.

      We appreciate your attention to these plant-specific differences. Previous studies showed that NopT expressed in tobacco (N. tabacum) or in specific Arabidopsis ecotypes (with PBS1/RPS5 genes) causes rapid cell death (Dai et al. 2008; Khan et al. 2022). Khan et al. 2022 reported recently that cell death does not occur in N. benthamiana unless the leaves were transformed with PBS1/RPS5 constructs. Our data shown in Fig. S15 confirm these findings. As cell death (effector triggered immunity) is usually associated with induction of plant protease activities, we considered N. tabacum and A. thaliana plants as not suitable for testing NFR5 cleavage by NopT. In fact, no NopT/NFR5 experiments were not performed with these plants in our study.  In response to your comment, we now better describe the N. benthamiana expression system and cite the previous articles_. Furthermore,  We have revised the Discussion section to better emphasize effector-induced immunity in non-host plants and the negative effect of rhizobial effectors during symbiosis. Our revisions certainly provide a clearer understanding of the advantages and limitations of the _N.  benthamiana expression system.

      (2) NFR5 Loss-of-function mutants do not produce nodules in the presence of rhizobia in lotus roots, and overexpression of NFR1 and NFR5 produces spontaneous nodules. In this regard, if the direct proteolysis target of NopT is NFR5, one could expect the NGR234's infection will not be very successful because of the Native NopT's specific proteolysis function of NFR5 and NFR1. Conversely, in Figure 5, authors observed the different results.

      Thank you for this comment, which points out that we did not address this aspect precisely enough in the original manuscript version.  We improved our manuscript and now write that nfr1 and nfr5 mutants do not produce nodules (Madsen et al., 2003; Radutoiu et al., 2003) and that over-expression of either NFR1 or NFR5 can activate NF signaling, resulting in formation of spontaneous nodules in the absence of rhizobia (Ried et al., 2014). In fact, compared to the nopT knockout mutant NGR234_ΔnopT_, wildtype NGR234 (with NopT) is less successful in inducing infection foci in root hairs of L. japonicus (Fig. 5). With respect to formation of nodule primordia, we repeated our inoculation experiments with NGR234_ΔnopT_ and wildtype NGR234 and also included a nopT over-expressing NGR234 strain into the analysis. Our data clearly showed that nodule primordium formation was negatively affected by NopT. The new data are shown in Fig. 5 of our revised version. Our data show that NGR234's infection is not really successful, especially when NopT is over-expressed. This is consistent  with our observations that NopT targets Nod factor receptors in L. japonicus and inhibits NF signaling (NIN promoter-GUS experiments). Our findings indicate that NopT is an “Avr effector” for L. japonicus.  However, in other host plants of NGR234, NopT possesses a symbiosis-promoting role (Dai et al. 2008; Kambara et al. 2009). Such differences could be explained by different NopT targets in different plants (in addition to Nod factor receptors), which may influence the outcome of the infection process. Indeed, our work shows hat NopT can interact with various kinase-dead LysM domain receptors, suggesting a role of NopT in suppression or activation of plant immunity responses depending on the host plant. We discuss such alternative mechanisms in our revised manuscript version and emphasize the need for further investigation to elucidate the precise mechanisms underlying the observed infection phenotype and the role of NopT in modulating symbiotic signaling pathways. In this context, we would also like to mention the two new figures of our manuscript which are showing (i) the efficiency of NFR5 cleavage by NopT in different expression systems, (ii) the interaction between NopT<sup>C93S</sup> and His-SUMO-NFR5<sup>JM</sup>-GFP, and (iii) cleavage of His-SUMO-NFP<sup>JM</sup>-GFP by NopT (Supplementary Figs. S8 and S9).

      (3) In Figure 6E, the model illustrates how NopT digests NFR5 to regulate rhizobia infection. However, it raises the question of whether it is reasonable for NGR234 to produce an effector that restricts its own colonization in host plants.

      Thank you for mentioning this point. We are aware of the possible paradox that the broad-host-range strain NGR234 produces an effector that appears to restrict its infection of host plants. As mentioned in our answer to the previous comment, NopT could have additional functions beyond the regulation of Nod factor signaling. In our revised manuscript version, we have modified our text as follows:

      (1) We mention the potential evolutionary aspects of NopT-mediated regulation of rhizobial infection and discuss the possibility that interactions between NopT and Nod factor receptors may have evolved to fine-tune Nod factor signaling to avoid rhizobial hyperinfection in certain host legumes.

      (2) We also emphasize that the presence of NopT may confer selective advantages in other host plants than L. japonicus due to interactions with proteins related to plant immunity. Like other effectors, NopT could suppress activation of immune responses (suppression of PTI) or cause effector-triggered immunity (ETI) responses, thereby modulating rhizobial infection and nodule formation. Interactions between NopT and proteins related to the plant immune system may represent an important evolutionary driving force for host-specific nodulation and explain why the presence of NopT in NGR234 has a negative effect on symbiosis with L. japonicus but a positive one with other legumes.

      (4) The failure to generate stable transgenic plants expressing NopT in Lotus japonicus is surprising, considering the manuscript's claim that NopT specifically proteolyzes NFR5, a major player in the response to nodule symbiosis, without being essential for plant development.

      We also thank for this comment. We have revised the Discussion section of our manuscript and discuss now our failure to generate stable transgenic L. japonicus plants expressing NopT. We observed that the protease activity of NopT in aerial parts of L. japonicus had a negative effect on plant development, whereas NopT expression in hairy roots was possible. Such differences may be explained by different NopT substrates in roots and aerial parts of the plant. In this context, we also discuss our finding that NopT not only cleaves NFR5 but is also able to proteolyze other proteins of L. japonicus such as LjLYS11, suggesting that NopT not only suppresses Nod factor signaling, but may also interfere with signal transduction pathways related to plant immunity. We speculate that, depending on the host legume species, NopT could suppress PTI or induce ETI, thereby modulating rhizobial infection and nodule formation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall the text and figure legends must be double-checked for correctness of scientific statements. The few listed here are just examples. There are more that are potentially damaging the perception by the readers and thus the value of the manuscript.

      The nopT mutant leads to more infections. In line 358 the statement: "...and the proteolysis of NFR5 are important for rhizobial infection", is wrong, as the infection works even better without it. It is, according to my interpretation of the results, important for the regulation of infection. Sounds a small difference, but it completely changes the meaning.

      We appreciate your thorough review and have taken the opportunity to correct this error. Following your suggestions, we carefully rephrased the whole text and figure legends to ensure that the scientific statements accurately reflect the findings of our study. We are convinced that these changed have increased the value of this study.

      In line 905 the authors state that NopTC indicates the truncated version of NopT after autocleavage by releasing about 50 a.a. at its N-terminus.

      They do not analyse this cleavage product to support this claim. So better rephrase.

      According to Dai et al. (2008), NopT expressed in E. coli is autocleaved. The N-terminal sequence GCCA obtained by Edman sequencing suggests that NopT was cleaved between M49 and G50.  We improved our manuscript and now write:

      (1) “A previous study has shown that NopT is autocleaved at its N-terminus to form a processed protein that lacks the first 49 amino acid residues (Dai et al., 2008)”

      (2) “However, NopT<sup>ΔN50</sup>, which is similar to autocleaved NopT, retained the ability to interact with NFR5 but not with NFR1 (Fig. S2D).”.

      In line 967: "Both NopT and NopTC after autocleavage exert proteolytic activities" This is confusing as it was suggested earlier that NopTc is a product of the autocleavage. There is no indication of another round of NopTc autocleavage or did I miss something?

      Thank you for bringing this inaccuracy to our attention. There is no second round of NopT autocleavage. We have corrected the text and write: “NopT and not<sup>C</sup> (autocleaved NopT) proteolytically cleave NFR5 at the juxtamembrane domain to release the intracellular domain of NFR5”

      Given the amount of work that went into the research, the presentation of the figures should be considerably improved. For example, in Figure 3F the mutant is not correctly annotated. In figure 5 the term infection foci and IT occur but it is not explained in the legend what these are, where they can be seen in the figure and how the researchers discriminated between the two events.

      In general, the labeling of the figure panels should be improved to facilitate the understanding. For example, in Figure 3 the panels switch between different host plant systems. The plant could be clarified for each panel to aid the reader. The asterisks are not in line with the signal that is supposed to be marked. And so on. I strongly advise to improve the figures.

      Thank you for your valuable suggestions. We acknowledge the importance of clear and informative figure presentation to enhance the understanding of our research findings. In response to your comments, we made a comprehensive revision of the figures to address the mentioned issues:

      (1) We corrected annotations of the mutant in Figure 3F to accurately represent the experimental conditions.

      (2) We revised the legend of Figure 5 and provide clear explanations of the terms "infection foci" and "IT" (infection threads) in the Methods section.

      (3) We improved the labeling of figure panels and improved the writing of the figure legend specifying the protein expression system (N. benthamiana, L. japonicus and E. coli, respectively). . We ensured that the asterisks indicating statistically significant results are properly aligned.

      Furthermore, we carefully reviewed each figure to enhance clarity and readability, including optimizing font size and line thickness. Captions and annotations were also revised.

      Figure 1

      • To verify that the lack of observed cell death is not linked to differential expression levels, an expression control Western blot is essential. In the expression control Western blot given in the supplemental materials (Supplemental fig. 1E), NFR5 is not visible in the first lane.

      We appreciate your comments on the control immunoblot which were made to verify the presence of NFR1, NFR5 and NopT in N. benthamiana.  However, as shown in Supplemental Fig. 1E, the intact NFR5 could not be immuno-detected when co-expressed with NFR1 and NopT. To ensure co-expression of NFR1/NFR5, A. tumefaciens carrying a binary vector with both NFR1 and NFR5 was used. In the revised version, we modified the figure legend accordingly and also included a detailed description of the procedure at lines 165-166

      • Labeling of NFR1/LjNFR1 should be kept consistent between the text and the figures. Currently, the text refers to both NFR1 and LjNFR1 and figures are labelled NFR1. The same is true for NFR5.

      Thank you for pointing out this inconsistency. We revised our manuscript and use now consistently NFR1 and NFR5 without a prefix to avoid any confusions.

      • A clearer description of how cell death was determined would be useful. In the selected pictures in panel D, leaves coexpressing nopT with Bax1 or Cerk1 appear very different from the pictures selected for NopM and AVr3a/R3a.

      We agree that a clearer description of our cell death experiments with N. benthamiana was necessary. We have re-worded the figure legend to provide more detailed information on the criteria used for assessing cell death. Additionally, we show now our images at higher resolution.

      • In panel D, the "Death/Total" ratio is only shown for leaf discs where nopT was coexpressed with the cell-death triggering proteins. Including the ratio for leaf discs where only the cell-death triggering protein (without nopT ) was expressed would make the figure more clear.

      Thank you for this suggestion. To provide a more comprehensive comparison, we included the "Cell death/Total" ratio for all leaf disc images shown in Fig. 1D. 

      Figure 2:

      • A: Split-YFP is not ideal as evidence for colocalization because of the chemical bond formed between the YFP fragments that may lead to artificial trapping/accumulation outside the main expression domains. Overall, the authors should revise if this figure aims to show colocalization or interaction. In the current text, both terms are used, but these are different interpretations.

      We appreciate your concern regarding the use of Split-YFP for colocalization analysis. We carefully reviewed the figure and corresponding text to ensure clarity in the interpretation of the results. The primary aim of this figure was to explore protein-protein interactions rather than strict colocalization. Protein-protein interactions have also been validated by other experiments of our work. We have revised the text accordingly and no longer emphasize on “co-localization”.

      • Given the focus on proteolytic activity in this paper, all blots need to be clearly labeled with size markers, and it would be good to include a supplemental figure with all other bands produced in the Western blot, regardless of their size. Without this, the results in panel 2D seem inconsistent with results presented in figure 3A, since NFR5 does not appear to be cleaved in the Western blot in 2D, but 3A shows cleavage when the same proteins (with different tags) are coexpressed in the same system.

      Thank you for bringing up this point. We ensured that all immunoblots are clearly labeled with size markers in our revised manuscript. We also carefully checked the consistency of the results presented in Figures 2D and Figure 3A and included appropriate clarifications in the revised manuscript. In Figure 2D, we show the bands at around 75 kD  (multi-bands would be detected below, including cleaved NFR5 by NopT, but also other non-specific bands).

      Figure 3:

      • In panel E, NopTC93S cannot cleave His-Sumo-NFR5JM-GFP, but it would be interesting to also show if NopTC93S can bind the NFR5JM fragment. It would also be useful to see this experiment done with the JM of NFP.

      Thank you for the suggestion. We agree that investigating the binding of NopT<sup>C93S</sup> to the NFR5<sup>JM</sup> fragment provides valuable insights into the interaction between NopT and NFR5. In our revised version, we show in the new Supplemental Fig. S4 that NopT interacts with NFR5JM and cleaves NFP<sup>JM</sup>. The Results section has been modified accordingly.

      • The panels in this figure require better labeling. In many panels, asterisks are misplaced relative to the bands they should highlight, and not all blots have size markers or loading controls.

      Thank you for bringing this to our attention. We carefully reviewed the labeling of all panels in Figure 3 to ensure accuracy and clarity. We ensured that asterisks are correctly placed in the figures. We also included size markers and loading controls to improve the quality of the shown immunoblots.

      • Since there is no clear evidence in this figure that the smear in the blot in panel C is phosphorylated NopT, it is recommended to provide a less interpretative label on the blot, and explain the label in the text.

      We appreciate your suggestion regarding the labeling of the blot in panel C of Fig. 3. We revised the label and provided a less interpretative designation in Fig. 3C. We also rephrased the figure legend and the text in the Results section as recommended.

      Figure 4

      • In B, a brief introduction in the text to the function of the Zn-phostag would make the figure easier to understand for more readers.

      Thank you for the suggestion. We agree and have provided a brief explanation in the Results section: “On such gels, a Zn<sup>2+</sup>-Phos-tag bound phosphorylated protein migrates slower than its unbound nonphosphorylated form. Furthermore, we have included the reference (Kato & Sakamoto, 2019) into the Methods section.

      Figure 5:

      • Change "Scar bar" to "Scale bar" in the figure captions

      Thank you for spotting that typo. We have corrected it.

      • Correct the references to the figures in the text

      We carefully reviewed the Figure 5 and made corresponding corrections to improve the quality of our manuscript Please check line 394-451.

      • It should be clarified what was quantified as "infection foci" (C, F, G)

      We revised the legend of Figure 5 and provide now explanations of the terms "infection foci" and "IT" (infection threads) in the Methods section.  Please check line 399-451.

      • It is recommended to use pictures that are from the same region of the plant root (the susceptible zone). The pictures in panel A appear to be from different regions, since the density of root hairs is different.

      Thank you for bringing this to our attention. We ensured that the images selected for panel A were from the same region of the plant root to guarantee consistency and accuracy of the comparison.

      • Panel G should be labeled so it is clearer that nopT is being expressed in L. japonicus transgenic roots.

      We have labeled this panel more clearly to help the reader understand that nopT was expressed in transgenic L. japonicus roots.

      • Panel F is missing statistical tests for ITs

      We apologize and have included the results of our statistical tests for ITs.

      Figure 6:

      • The model presented in panel E misrepresents the role of NFR5 according to the results in the paper. From the evidence presented, it is not clear if the observed rhizobial infection phenotype is due to reduced abundance of full-length NFR5, or if the cleaved NFR5 fragment is suppressing infection. Additionally, S. fredii should not be drawn so close to the plasma membrane, since the bacteria are located outside the cell wall when the T3SS is active.

      We appreciate your comment which helps us to improve the interpretation of our results. We agree that the model should accurately reflect the uncertainties regarding the role of NFR5. We revised the model (positioning of S. fredii etc.) and write in the Discussion:

      “NopT impairs the function of the NFR1/NFR5 receptor complex. Cleavage of NFR5 by NopT reduces its protein levels. Possible inhibitory effects of NFR5 cleavage products on NF signaling are unknown but cannot be excluded.”

      Reviewer #2 (Recommendations For The Authors):

      (1) Some minor weaknesses need addressing: In Figure 5A, the root hair density in the two images appears significantly different. Are these images representative of each treatment?

      We appreciate your attention to detail and the importance of ensuring that the images in Figure 5A are representative. We carefully reviewed our image selection process and confirm that the shown images are indeed representative of each treatment group. In our revised version, we show additional images and also improved the text in the figure legend. Furthermore, we performed additional GUS staining tests and the new data are shown in Fig 5A abd 5B.

      (2) Additionally, please ensure consistency in the format of genotype names throughout the manuscript. For instance, in Line 897, "Italy" should be used in place of "N. benthamiana."

      We thank you for pointing out the format of genotype names and corrected our manuscript as requested.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      The authors introduced their previous paper with the concise statement that "the relationships between lineage-specific attributes and genotypic differences of tumors are not understood" (Chen et al., JEM 2019, PMID: 30737256). For example, it is not clear why combined loss of RB1 and TP53 is required for tumorigenesis in SCLC or other aggressive neuroendocrine (NE) cancers, or why the oncogenic mutations in KRAS or EGFR that drive NSCLC tumorigenesis are found so infrequently in SCLC. This is the main question addressed by the previous and current papers. 

      One approach to this question is to identify a discrete set of genetic/biochemical manipulations that are sufficient to transform non-malignant human cells into SCLC-like tumors. One group reported the transformation of primary human bronchial epithelial cells into NE tumors through a complex lentiviral cocktail involving the inactivation of pRB and p53 and activation of AKT, cMYC, and BCL2 (PARCB) (Park et al., Science 2018, PMID: 30287662). The cocktail previously reported by Chen and colleagues to transform human pluripotent stem-cell (hPSC)-derived lung progenitors (LPs) into NE xenografts was more concise: DAPT to inactivate NOTCH signaling combined with shRNAs against RB1 and TP53. However, the resulting RP xenografts lacked important characteristics of SCLC. Unlike SCLC, these tumors proliferated slowly and did not metastasize, and although small subpopulations expressed MYC or MYCL, none expressed NEUROD1. 

      MYC is frequently amplified or expressed at high levels in SCLC, and here, the authors have tested whether inducible expression of MYC could increase the resemblance of their hPSC-derived NE tumors to SCLC. These RPM cells (or RPM T58A with stabilized cMYC) engrafted more consistently and grew more rapidly than RP cells, and unlike RP cells, formed liver metastases when injected into the renal capsule. Gene expression analyses revealed that RPM tumor subpopulations expressed NEUROD1, ASCL1, and/or YAP1. 

      The hPSC-derived RPM model is a major advance over the previous RP model. This may become a powerful tool for understanding SCLC tumorigenesis and progression and for discovering gene dependencies and molecular targets for novel therapies. However, the specific role of cMYC in this model needs to be clarified. 

      cMYC can drive proliferation, tumorigenesis, or apoptosis in a variety of lineages depending on concurrent mutations. For example, in the Park et al., study, normal human prostate cells could be reprogrammed to form adenocarcinoma-like tumors by activation of cMYC and AKT alone, without manipulation of TP53 or RB1. In their previous manuscript, the authors carefully showed the role of each molecular manipulation in NE tumorigenesis. DAPT was required for NE differentiation of LPs to PNECs, shRB1 was required for expansion of the PNECs, and shTP53 was required for xenograft formation. cMYC expression could influence each of these steps, and importantly, could render some steps dispensable. For example, shRB1 was previously necessary to expand the DAPT-induced PNECs, as neither shTP53 nor activation of KRAS or EGFR had no effect on this population, but perhaps cMYC overexpression could expand PNECs even in the presence of pRB, or even induce LPs to become PNECs without DAPT. Similarly, both shRB1 and shTP53 were necessary for xenograft formation, but maybe not if cMYC is overexpressed. If a molecular hallmark of SCLC, such as loss of RB1 or TP53, has become dispensable with the addition of cMYC, this information is critically important in interpreting this as a model of SCLC tumorigenesis.  

      The reviewer’s suggestion may be possible; indeed, in a recent report from our group (Gardner EE, et al., Science 2024) we have shown, using genetically engineered mouse modeling coupled with lineage tracing, that the cMyc oncogene can selectively expand Ascl1+ PNECs in the lung.

      We agree with the reviewer that not having a better understanding of the individual components necessary and/or sufficient to transform hESC-derived LPs is an important shortcoming of this current work. However, we would like to stress three important points about the comments:  1) tumors were reviewed and the histological diagnoses were certified by a practicing pulmonary pathologist at WCM (our co-author, C. Zhang); 2 )the observed  transcriptional programs were consistent with primary human SCLC; and 3) RB1-proficient SCLC is now recognized as a rare presentation of SCLC (Febrese-Aldana CA, et al., Clin. Can. Res. 2022. PMID: 35792876).

      To interpret the role of cMYC expression in hPSC-derived RPM tumors, we need to know what this manipulation does without manipulation of pRB, p53, or NOTCH, alone or in combination. Seven relevant combinations should be presented in this manuscript: (1) cMYC alone in LPs, (2) cMYC + DAPT, (3) cMYC + shRB1, (4) cMYC + DAPT + shRB1, (5) cMYC + shTP53, (6) cMYC + DAPT + shTP53, and (7) cMYC + shRB1 + shTP53. Wildtype cMYC is sufficient; further exploration with the T58A mutant would not be necessary. 

      We respectfully disagree that an interrogation of the differences between the phenotypes produced by wildtype and Myc(T58A) would not be informative. (Our view is confirmed by the second reviewer; see below.)    It is well established that Myc gene or protein dosage can have profound effects on in vivo phenotypes (Murphy DJ, et al., Cancer Cell 2008. PMID: 19061836). The “RPM” model of variant SCLC developed by Trudy Oliver’s lab relied on the conditional T58A point mutant of cMyc, originally made by Rob Wechsler-Reya. While we do not discuss the differences between Myc and Myc(T58A), it is nonetheless important to present our results with both the WT and mutant MYC constructs, as we are aware of others actively investigating differences between them in GEMM models of SCLC tumor development.

      We agree with the reviewer about the virtues of trying to identify the effects of individual gene manipulations; indeed our original paper (Chen et al., J. Expt. Med. 2019), describing the RUES2derived model of SCLC did just that, carefully dissecting events required to transform LPs towards a SCLC-like state. The central  purpose of the current study was to determine the effects of adding cMyc on the behavior of weakly tumorigenic SCLC-like cells cMyc.  Presenting data with these two alleles to seek effects of different doses of MYC protein seems reasonable.

      This reviewer considers that there should be a presentation of the effects of these combinations on LP differentiation to PNECs, expansion of PNECs as well as other lung cells, xenograft formation and histology, and xenograft growth rate and capacity for metastasis. If this could be clarified experimentally, and the results discussed in the context of other similar approaches such as the Park et al., paper, this study would be a major addition to the field.  

      Reviewer #2 (Public Review): 

      Summary: 

      Chen et al use human embryonic stem cells (ESCs) to determine the impact of wildtype MYC and a point mutant stable form of MYC (MYC-T58A) in the transformation of induced pulmonary neuroendocrine cells (PNEC) in the context of RB1/P53 (RP) loss (tumor suppressors that are nearly universally lost in small cell lung cancer (SCLC)). Upon transplant into immune-deficient mice, they find that RP-MYC and RP-MYC-T58A cells grow more rapidly, and are more likely to be metastatic when transplanted into the kidney capsule, than RP controls. Through single-cell RNA sequencing and immunostaining approaches, they find that these RPM tumors and their metastases express NEUROD1, which is a transcription factor whose expression marks a distinct molecular state of SCLC. While MYC is already known to promote aggressive NEUROD1+ SCLC in other models, these data demonstrate its capacity in a human setting that provides a rationale for further use of the ESC-based model going forward. Overall, these findings provide a minor advance over the previous characterization of this ESC-based model of SCLC published in Chen et al, J Exp Med, 2019. 

      We consider the findings more than a “minor” advance in the development of the model, since any useful model for SCLC would need to form aggressive and metastatic tumors.

      The major conclusion of the paper is generally well supported, but some minor conclusions are inadequate and require important controls and more careful analysis. 

      Strengths:

      (1) Both MYC and MYC-T58A yield similar results when RP-MYC and RP-MYCT58A PNEC ESCs are injected subcutaneously, or into the renal capsule, of immune-deficient mice, leading to the conclusion that MYC promotes faster growth and more metastases than RP controls. 

      (2) Consistent with numerous prior studies in mice with a neuroendocrine (NE) cell of origin (Mollaoglu et al, Cancer Cell, 2017; Ireland et al, Cancer Cell, 2020; Olsen et al, Genes Dev, 2021), MYC appears sufficient in the context of RB/P53 loss to induce the NEUROD1 state. Prior studies also show that MYC can convert human ASCL1+ neuroendocrine SCLC cell lines to a NEUROD1 state (Patel et al, Sci Advances, 2021); this study for the first time demonstrates that RB/P53/MYC from a human neuroendocrine cell of origin is sufficient to transform a NE state to aggressive NEUROD1+ SCLC. This finding provides a solid rationale for using the human ESC system to better understand the function of human oncogenes and tumor suppressors from a neuroendocrine origin. 

      Weaknesses:

      (1) There is a major concern about the conclusion that MYC "yields a larger neuroendocrine compartment" related to Figures 4C and 4G, which is inadequately supported and likely inaccurate. There is overwhelming published data that while MYC can promote NEUROD1, it also tends to correlate with reduced ASCL1 and reduced NE fate (Mollaoglu et al, Cancer Cell, 2017; Zhang et al, TLCR, 2018; Ireland et al, Cancer Cell, 2020; Patel et al, Sci Advances, 2021). Most importantly, there is a lack of in vivo RP tumor controls to make the proper comparison to judge MYC's impact on neuroendocrine identity. RPM tumors are largely neuroendocrine compared to in vitro conditions, but since RP control tumors (in vivo) are missing, it is impossible to determine whether MYC promotes more or less neuroendocrine fate than RP controls. It is not appropriate to compare RPM tumors to in vitro RP cells when it comes to cell fate. Upon inspection of the sample identity in S1B, the fibroblast and basal-like cells appear to only grow in vitro and are not well represented in vivo; it is, therefore, unclear whether these are transformed or even lack RB/P53 or express MYC. Indeed, a close inspection of Figure S1B shows that RPM tumor cells have little ASCL1 expression, consistent with lower NE fate than expected in control RP tumors. 

      We would like to clarify two points related to the conclusions that we draw about MYC’s ability to promote an increase in the neuroendocrine fraction in hESC-derived cultures:  1) The comparisons in Figures 4C were made between cells isolated in culture following the standard 50 day differentiation protocol, where, following generation of LPs around day 25, MYC was added to the other factors previously shown to enrich for a PNEC phenotype (shRB1, shTP53, and DAPT). Therefore, the argument that MYC increased the frequency of “neuroendocrine cells” (which we define by a gene expression signature) is a reasonable conclusion in the system we are using; and 2) following injection of these cells into immunocompromised mice, an ASCL1-low / NEUROD1-high presentation is noted (Supplemental Figures 1F-G). In the few metastases that we were able use to sequence bulk RNA, there is an even more pronounced increase in expression of NEUROD1 with a decrease in ASCL1.

      Some confusion may have arisen from our previous characterization of neuroendocrine (NE) cells using either ASCL1 or NEUROD1 as markers. To clarify, we have now designated cells positive for ASCL1 as classical NE cells and those positive for NEUROD1 as the NE variant. According to this revised classification, our findings indicate that MYC expression leads to an increase in the NEUROD1+ NE variant and a decrease in ASCL1+ classical NE cells. This adjustment has been reflected on the results section titled, “Inoculation of the renal capsule facilitates metastasis of the RUES2-derived RPM tumors” of the manuscript.  

      From the limited samples in hand, we compared the expression of ASCL1 and NEUROD1 in the weakly tumorigenic hESC RP cells after successful primary engraftment into immunocompromised mice. As expected, the RP tumors were distinguished by the lack of expression of NEUROD1, compared to levels observed in the RPM tumors.

      In addition, since MYC appears to require Notch signaling to induce  NE fate (cf Ireland et al), the presence of DAPT in culture could enrich for NE fate despite MYC's presence. It's important to clarify in the legend of Fig 4A which samples are used in the scRNA-seq data and whether they were derived from in vitro or in vivo conditions (as such, Supplementary Figure S1B should be provided in the main figure). Given their conclusion is confusing and challenges robustly supported data in other models, it is critical to resolve this issue properly. I suspect when properly resolved, MYC actually consistently does reduce NE fate compared to RP controls, even though tumors are still relatively NE compared to completely distinct cellular identities such as fibroblasts.

      We have clarified the source of tumor sequencing data and the platform (single cell or bulk) in Figure 4 and Supplemental Figure 1. To reiterate – the RNA sequencing results from paired metastatic and primary tumors from the RPM model are derived from bulk RNA;  the single cell RNA data in RP or RPM datasets are from cells in culture.  These distinctions are clarified in the legend to Supplemental Figure 1.

      (2) The rigor of the conclusions in Figure 1 would be strengthened by comparing an equivalent number of RP animals in the renal capsule assay, which is n = 6 compared to n = 11-14 in the MYC conditions.

      As we did not perform a power calculation to determine a sample size required to draw a level of statistical significance from our conclusions, this comment is not entirely accurate. Our statistical rigor was limited by the availability of samples from the RP tumor model.

      (3) Statistical analysis is not provided for Figures 2A-2B, and while the results are compelling, may be strengthened by additional samples due to the variability observed. 

      We acknowledge that the cohorts are relatively small but we have added statistical comparisons in Figure 2B. 

      (4a) Related to Figure 3, primary tumors and liver metastases from RPM or RPM-T58A-expressing cells express NEUROD1 by immunohistochemistry (IHC) but the putative negative controls (RP) are not shown, and there is no assessment of variability from tumor to tumor, ie, this is not quantified across multiple animals. 

      The results of H&E and IF staining for ASCL1, NEUROD1, CGRP, and CD56 in negative control (RP tumors) are presented in the updated Figure 3F-G.

      (4b) Relatedly, MYC has been shown to be able to push cells beyond NEUROD1 to a double-negative or YAP1+ state (Mollaoglu et al, Cancer Cell, 2017; Ireland et al, Cancer Cell, 2020), but the authors do not assess subtype markers by IHC. They do show subtype markers by mRNA levels in Fig 4B, and since there is expression of ASCL1, and potentially expression of YAP1 and POU2F3, it would be valuable to examine the protein levels by IHC in control RP vs. RPM samples.

      YAP1 positive SCLC is still somewhat controversial, so it is not clear what value staining for YAP1 offers beyond showing the well-established markers, ASCL1 and NEUROD1.  

      (5) Given that MYC has been shown to function distinctly from MYCL in SCLC models, it would have raised the impact and value of the study if MYC was compared to MYCL or MYCL fusions in this context since generally, SCLC expresses a MYC family member. However, it is quite possible that the control RP cells do express MYCL, and as such, it would be useful to show. 

      We now include Supplemental Figure S2 to illustrate four important points raised by this reviewer and others:  1) expression of MYC family members in the merged dataset (RP and RPM) is low or undetectable in the basal/fibroblast cultures; 2) MYC does have a weak correlation with EGFP in the neuroendocrine cluster when either WT MYC or T58A MYC is overexpressed; 3) MYCL and MYCN are detectable, but at low levels compared to CMYC; and 4) Expression of  ASCL1 is anticorrelated with MYC expression across the merged single cell datasets using RP and RPM models.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors continue their study of the experimental model of small cell lung cancer (SCLC) they created from human embryonic stem cells (hESCs) using a protocol for differentiating the hESCs into pulmonary lineages followed by NOTCH signaling inactivation with DAPT, and then knockdown of TP53 and RB1 (RP models) with DOX inducible shRNAs. To this published model, they now add DOX-controlled activation of expression of a MYC or T58A MYC transgenes (RPM and RPMT58A models) and study the impact of this on xenograft tumor growth and metastases. Their major findings are that the addition of MYC increased dramatically subcutaneous tumor growth and also the growth of tumors implanted into the renal capsule. In addition, they only found liver and occasional lung metastases with renal capsule implantation. Molecular studies including scRNAseq showed that tumor lines with MYC or T58A MYC led surprisingly to more neuroendocrine differentiation, and (not surprisingly) that MYC expression was most highly correlated with NEUROD1 expression. Of interest, many of the hESCs with RPM/RPMT58A expressed ASCL1. Of note, even in the renal capsule RPM/RPMT58A models only 6/12 and 4/9 mice developed metastases (mainly liver with one lung metastasis) and a few mice of each type did not even develop a renal sub capsule tumor. The authors start their Discussion by concluding: " In this report, we show that the addition of an efficiently expressed transgene encoding normal or mutant human cMYC can convert weakly tumorigenic human PNEC cells, derived from a human ESC line and depleted of tumor suppressors RB1 and TP53, into highly malignant, metastatic SCLC-like cancers after implantation into the renal capsule of immunodeficient mice.". 

      Strengths: 

      The in vivo study of a human preclinical model of SCLC demonstrates the important role of c-Myc in the development of a malignant phenotype and metastases. Also the role of c-Myc in selecting for expression of NEUROD1 lineage oncogene expression. 

      Weaknesses: 

      There are no data on results from an orthotopic (pulmonary) implantation on generation of metastases; no comparative study of other myc family members (MYCL, MYCN); no indication of analyses of other common metastatic sites found in SCLC (e.g. brain, adrenal gland, lymph nodes, bone marrow); no studies of response to standard platin-etoposide doublet chemotherapy; no data on the status of NEUROD1 and ASCL1 expression in the individual metastatic lesions they identified. 

      We have acknowledged from the outset that our study has significant limitations, as noted by this reviewer, and we explained in our initial letter of response why we need to present this limited, but still consequential, story at this time. 

      In particular, while we have attempted orthotopic transplantations of RPM tumor cells into NSG mice (by tail vein or intra-pulmonary injection, or intra-tracheal instillation of tumor cells), these methods were not successful in colonizing the lung. Additionally, we have compared the efficacy of platinum/etoposide to that of removing DOX in established RPM subcutaneous tumors, but we chose not to include these data as we lacked a chemotherapy responsive tumor model, and thus could not say with confidence that the chemotherapeutic agants were active and that the RPM models were truly resistant to standard SCLC chemotherapy. In a discussion about other metastatic sites, we have now included the following text: 

      “In animals administered DOX, histological examinations showed that approximately half developed metastases in distant organs, including the liver or lung (Figure 1D). No metastases were observed in the bone, brain, or lymph nodes. For a more detailed assessment, future studies could employ more sensitive imaging methods, such as luciferase imaging.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Technical points related to Major Weakness #1: 

      For Figure 4: Cells were enriched for EGFP-high cells only, under the hypothesis that cells with lower EGFP may have silenced expression of the integrated vector. Since EGFP is expressed only in the shRB1 construct, selection for high EGFP may inadvertently alter/exclude heterogeneity within the transformed population for the other transgenes (shP53, shMYC/MYC-T58A). Can authors include data to show the expression of MYC/MYC T58A in EGFP-high v -med v-low cells? MYC levels may alter the NEdifferentiation status of tumor cells. 

      Please now refer to Supplemental Figure S2.

      Related to the appropriateness of the methods for Figure 4C, the authors state, "We performed differential cluster abundance analysis after accounting for the fraction of cells that were EGFP+". If only EGFP+ cells were accounted for in the analysis for 4C, the majority of RP cells in the "Neuroendocrine differentiated" cluster would not be included in the analysis (according to EGFP expression in Fig S1A-B), and therefore inappropriately reduce NE identity compared to RPM samples that have higher levels of EGFP. 

      There is no consideration or analysis of cell cycling/proliferation until after the conclusion is stated. Yet, increased proliferation of MYC-high vs MYC-low cultures would enhance selection for more tumors (termed "NE-diff") than non-tumors (basal/fibroblast) in 2D cultures. 

      The expression of MYC itself isn't assessed for this analysis but assumed, and whether higher levels of MYC/MYC-T58A may be present in EGFP+ tumor cells that are in the NE-low populations isn't clear. Can MYC-T58A/HA also be included in the reference genome? 

      We did not include an HA tag in our reference transcriptome. For [some] answers to this and other related questions, please refer to Supplemental Figure S2.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The experiments are all technically well done and clearly presented and represent a logical extension exploring the role of c-Myc in the hESC experimental model system. 

      We appreciate this supportive comment!

      (2) It is of great interest that both the initial RP model only forms "benign" tumors and that with the addition of a strong oncogene like c-myc, where expression is known to be associated with a very bad prognosis in SCLC, that while one gets tumor formation there are still occasional mice both for subcutaneous and renal capsule test sites that don't get tumors even with the injection of 500,000 RPM/RPMT58A cells. In addition, of the mice that do form tumors, only ~50% exhibit metastases from the renal sub-capsule site. The authors need to comment on this further in their Discussion. To me, this illustrates both how incredibly resistant/difficult it is to form metastases, thus indicating the need for other pathways to be activated to achieve such spread, and also represents an opportunity for further functional genomic tests using their preclinical model to systematically attack this problem. Obvious candidate genes are those recently identified in genetically engineered mouse models (GEMMs) related to neuronal behavior. In addition, we already know that full-fledged patient-derived SCLC when injected subcutaneously into immune-deprived mice don't exhibit metastases - thus, while the hESC RPM result is not surprising, it indicates to me the power of their model (logs less complicated genetically than a patient SCLC) to sort through a mechanism that would allow metastases to develop from subcutaneous sites. The authors can point these things out in their Discussion section to provide a "roadmap" for future research. 

      Although we remain mindful of the relatively small cohorts we have studied, the thrust of Reviewer #3’s comments is now included in the Discussion. And there is, of course, a lot more to do, and it has taken several years already to get to this point. Additional information about the prolonged gestation of this project and about the difficulties of doing more in the near future was described in our initial response to reviewers/Editor, included near the start of this letter.    

      (3) I will state the obvious that this paper would be much more valuable if they had compared and contrasted at least one of the myc family members (MYCL or MYCN) with the CMYC findings whatever the results would be. Most SCLC patients develop metastases, and most of their tumors don't express high levels of CMYC (and often use MYCL). In any event, as the authors Discuss, this will be an important next stage to test.

      We have acknowledged and explained the limitations of the work in several ways. Further, we were unaware of the relationship between metastases and the expression of MYC and MYCL1 noted by the reviewer; we will look for confirmation of this association in any future studies, although we have not encountered it in current literature.

      (4) Their assays for metastases involved looking for anatomically "gross" lesions. While that is fine, particularly given that the "gross" lesions they show in figures are actually pretty small, we still need to know if they performed straightforward autopsies on mice and looked for other well-known sites of metastases in SCLC patients besides liver and lung - namely lymph nodes, adrenal, bone marrow, and brain. I would guess these would probably not show metastatic growth but with the current report, we don't know if these were looked for or not. Again, while this could be a "negative" result, the paper's value would be increased by these simple data. Let's assume no metastases are seen, then the authors could further strengthen the case for the value of their hESC model in systematically exploring with functional genomics the requirements to achieve metastases to these other sites.

      We have included descriptions of what we found and didn’t find at other potential sites of metastasis in the results section, with the following sentences: 

      “In animals administered DOX, histological examinations showed that approximately half developed metastases in distant organs, including the liver or lung (Figure 1D). No metastases were observed in the bone, brain, or lymph nodes. For a more detailed assessment, future studies could employ more sensitive imaging methods, such as luciferase imaging.”

      (5) Related to this, we have no idea if the mice that developed liver metastases (or the one mouse with lung metastasis) had more than one metastatic site. They will know this and should report it. Again, my guess is that these were isolated metastases in each mouse. Again, they can indicate the value of their model in searching for programs that would increase the number of the various organs. 

      We appreciate the suggestion. We observed that one of the mice developed metastatic tumors in both the liver and lungs. This information has been incorporated into the Results section.

      (6) While renal capsule implantation for testing growth and metastatic behavior is reasonable and based on substantial literature using this site for implantation of patient tumor specimens, what would have increased the value of the paper is knowing the results from orthotopic (lung implantation). Whatever the results were (they occurred or did not occur) they will be important to know. I understand the "future experiments" argument, but in reading the manuscript this jumped out at me as an obvious thing for the authors to try. 

      We conducted orthotopic implantation several ways, including via intra-tracheal instillation of 0.5 million RP or RPM cells in PBS per mouse. However, none of the subjects (0/5 mice) developed tumor-like growths and the number of animals used was small. Further, this outcome could be attributed to biological or physical factors. For instance, the conducting airway is coated with secretory cells producing protective mucins and may not have retained the 0.5 million cells. This is one example that may have hindered effective colonization. Future adjustments, such as increasing the number of cells, embedding them in Matrigel, or damaging the airway to denude secretory cells and trigger regeneration might alter the outcomes. These ideas might guide future work to strengthen the utility of the models.

      (7) Another obvious piece of data that would have improved the value of this manuscript would be to know whether the RPM tumors responded to platin-etoposide chemotherapy. Such data was not presented in their first RP hESC notch inhibition paper (which we now know generated what the authors call "benign" tumors). While I realize chemotherapy responses represent other types of experiments, as the authors point out one of the main reasons they developed their new human model was for therapy testing. Two papers in and we are all still asking - does their model respond or not respond dramatically to platin-etoposide therapy? Whatever the results are they are a vital next step in considering the use of their model. 

      Please see the comments above regarding our decision not to include data from a clinical trial that lacked appropriate controls.

      (8) The finding of RPM cells that expressed NEUROD1, ASCL1, or both was interesting. From the way the data were presented, I don't have a clear idea which of these lineage oncogenes the metastatic lesions from ~11 different mice expressed. Whatever the result is it would be useful to know - all NEUROD1, some ASCL1, some mixed etc.

      Based on the bulk RNA-sequencing of a few metastatic sites (Figure 4H), what we can demonstrate is that all sites were NEUROD1 and expressed low or no detectable  ASCL1.

      (9) While several H&E histologic images were presented, even when I enlarged them to 400% I couldn't clearly see most of them. For future reference, I think it would be important to have several high-quality images of the RP, RPM, RPMT58A subcutaneous tumors, sub-renal capsule tumors, and liver and lung metastatic lesions. If there is heterogeneity in the primary tumors or the metastases it would be important to show this. The quality of the images they have in the pdf file is suboptimal. If they have already provided higher-quality images - great. If not, I think in the long run as people come back to this paper, it will help both the field and the authors to have really great images of their tumors and metastases. 

      We have attempted to improve the quality of the embedded images. Digital resolution is a tradeoff with data size – higher resolution images are always available upon request, but may not be suitable  for generation of figures in a manuscript viewed on-line.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary:

      Here the authors convincingly identify and characterize the SERBP1 interactome and further define its role in the nucleus, where it is associated with complexes involved in splicing, cell division, chromosome structure, and ribosome biogenesis. Many of the SERBP1-associated proteins are RNA-binding proteins and SERBP1 exerts its impact, at least in part, through these players. SERBP1 is mostly disordered but along with its associated proteins displays a preference for G4 binding and can bind to PAR and be PARylated. They present data that strongly suggest that complexes in which SERBP1 participates are assembled through G4 or PAR binding. The authors suggest that because SERBP1 lacks traditional functional domains yet is clearly involved in distinct regulatory complexes, SERBP1 likely acts in the early steps of assembly through the recognition of interacting sites present in RNA, DNA, and proteins.

      Strengths:

      The data is very convincing and demonstrated through multiple approaches.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      In this study the authors have used pull-down experiments in a cell line overexpressing tagged SERPINE1 mRNA binding protein 1 (SERBP1) followed by mass spectrometry-based proteomics, to establish its interactome. Extensive analyses are performed to connect the data to published resources. The authors attempt to connect SERBP1 to stress granules and Alzheimer's disease-associated tau pathology. Based on the interactome, the authors propose a cross-talk between SERBP1 and PARP1 functions.

      Strengths:

      The main strength of this study lies in the proteomics data analysis, and its effort to connect the data to published studies.

      Weaknesses:

      While the authors propose a feedback regulatory model for SERBP1 and PARP1 functions, strong evidence for PARylation modulating SERBP1 functions is lacking. PARP inhibition decreasing the amount of PARylated proteins associated with SERBP1 and likely all other PARylated proteins is expected. This study is also incomplete in its attempt to establish a connection to Alzheimer's disease related tauopathy. A single AD case is not sufficient, and frozen autopsy tissue shows unexplained punctate staining likely due to poor preservation of cellular structures for immunohistochemistry. There is a lack of essential demographic data, source of the tissue, brain regions shown, and whether there was an IRB protocol for the human brain tissue. The presence of phase-separated transient stress granules in an autopsy brain is unlikely, even if G3BP1 staining is present. Normally, stress granule proteins move to the cytoplasm under cellular stress, whereas SERBP1 becomes nuclear. The co-localization of abundant cytoplasmic G3BP1 and SERBP1 under normal conditions does not indicate an association with stress granules.

      Reviewer #3 (Public Review):

      Summary:

      A survey of SERBP1-associated functions and their impact on the transcriptome upon gene depletion, as well as the identification of chemical inhibitors upon gene over-expression.

      Strengths:

      (1) Provides a valuable resource for the community, supported by statistical analyses.

      (2) Offers a survey of different processes with correlation data, serving as a good starting point for the community to follow up.

      Weaknesses:

      (1) The authors provided numerous correlations on diverse topics, from cell division to RNA splicing and PARP1 association, but did not follow up their findings with experiments, offering little mechanistic insight into the actual role of SERBP1. The model in Figure 5D is entirely speculative and lacks data support in the manuscript.

      Our article includes several pieces of evidence that support SERBP1’s role in splicing, translation, cell division and association with PARP1. We respectfully disagree that the model in Figure 5D is speculative. The goal of our study was to generate initial evidence of SERBP1 involvement in different biological processes based on its interactome. The characterization of molecular mechanisms in all these scenarios requires a substantial amount work and will the topic of follow up manuscripts. 

      (2) Following up with experiments to demonstrate that their findings are real (e.g., those related to splicing defects and the PARylation/PAR-binding association) would be beneficial. For example, whether the association between PARP1 and SERBP1 is sensitive to PAR-degrading enzymes is unclear.

      We included experiments showing the interaction between endogenous SERBP1 and PARP1. Additionally, we demonstrated that SERBP1 interaction with PARP1 was disrupted when cells are treated with PARP inhibitors.

      (3) They did not clearly articulate how experiments were performed. For instance, the drug screen and even the initial experiment involving the pull-down were poorly described. Many in the community may not be familiar with vectors such as pSBP or pUltra without looking up details.

      We provided additional details about the vectors and expanded the description of experiments in results and figure legends.

      (4) The co-staining of SERBP1 with pTau, PARP1, and G3BP1 in the brain is interesting, but it would be beneficial to follow up with immunoprecipitation in normal and patient samples to confirm the increased physical association.

      Thank you for this suggestion. We performed instead a Proximity Ligation Assay (PLA) on human tissue. Data was included in Fig. 7B and C. PLA between pTau and SERBP1 confirmed interaction in AD cortices as well as SERBP1 with PARP1.

      (5) The combination index of 0.7-0.9 for PJ34 + siSERBP1 is weak. Could this be due to the non-specific nature of the drug against other PARPs? Have the authors looked into this possibility?

      The combination index could be considered weak in the case of U251 cells but not in the case of U343 cells. PJ34 has been shown to be mainly a PARP-1 inhibitor. Different PJ34 concentrations and different drugs will be examined in future studies. It is worth mentioning that in a genetic screening, SERBP1 has been shown to increase sensitivity to different PARP inhibitors (PMID: 37160887). This information is included in the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      This is a really well-done piece of research that is written very well. The data are very convincing and the conclusions are well supported. Some wording in Figures 2B and D is pixelated and hard to read. All the figure legends could benefit from being expanded but this is especially true for Figures 2, 3, 7, and 8. There is a ton of data being presented and a very limited description of what was done and what is being concluded. Some of the content may not be fully comprehended by some readers with limited descriptions.

      We revised all figures to assure images are clear and their resolution is high. We expanded all figure legends to provide a better explanation of the experimental design.

      Reviewer #2 (Recommendations For The Authors):

      The "merged" pdf file is the same as the "article".

      Individual files were uploaded this time.

      The abstract should spell out acronyms, such as the name of the protein Serpine1 mRNA-binding protein 1 (SERBP1).

      This was not included since the abstract has a word limit.

      "SERBP1 (Serpine1 mRNA-binding protein 1) is a unique member of this group of RBPs". In what way is it unique?

      The text was modified to better explain SERBP1’s singularities.

      "RBPs containing IDRs and RGG motifs are particularly relevant in the nervous system. Their misfolding contributes to the formation of pathological protein aggregates in Alzheimer's disease (AD), Frontotemporal Lobar Dementia (FTLD), Amyotrophic Lateral Sclerosis (ALS), and Parkinson's disease (PD)" -> while TDP-43 and FUS in ALS/FTD may fit this description, it is not true for tau and amyloid-beta (AD) and alpha-synuclein (PD).

      "SERBP1 is a unique RBPs containing IDRs and RGG motifs yet lacks other readily recognizable, canonical or structured RNA binding motifs. Moreover, SERBP1 has been observed by our study and others as common Tau interactor in Alzheimer’s Disease (AD) brains. RBPs containing IDRs (e.g. TDP-43, FUS, hnRNPs, TIA1) have been shown self-aggregate and co-aggregate with pathogenic amyloids (Tau, Aβ-amyloid and α-Synuclein)  in AD, Frontotemporal Lobar Dementia (FTLD), Amyotrophic Lateral Sclerosis (ALS), and Parkinson's disease (PD) and this suggest that, like other IDRs RBPs, SERBP1 contributes to RNA dysmetabolism in neurodegenerative diseases”.

      While the authors propose a feedback regulatory model for SERBP1 and PARP1 functions, strong evidence for PARylation modulating SERBP1 functions is lacking. The fact that PARP inhibition decreases the amount of PARylated proteins associated with SERBP1 and likely all other PARylated proteins is expected and cannot count as evidence.

      We included data showing that treatment with PJ34 (PARP inhibitor) decreases SERBP1 interaction with PARP1 and G3BP1. We are currently conducting a more extensive analysis to identify SERBP1 PAR binding domain and the impact of PARP inhibition on its interactions and functions. These experiments will be included in a new manuscript.

      A single AD case is not sufficient.

      Sorry for the poor clarity. We included in the study 6 cases from age-matched controls and 6 cases of AD. We summarize all cases demographics, and the experimental application assigned to each case in Table 1. Moreover, we included a paragraph regarding Human tissue harvesting.

      Most western blot data are not quantified from multiple replicates, as required.

      Quantifications are now provided.

      FTLD - frontotemporal lobar degeneration (not dementia).

      This was corrected.

      Frozen autopsy tissue is problematic due to poor preservation. The staining presented here shows unexplained punctate staining likely due to poor preservation of cellular structures for immunohistochemistry.

      We included a paragraph regarding human tissue harvesting. We have successfully used frozen tissues in our previous studies, observing a well preserved neuronal and tissue structure (PMIDs: 32855391, 31532069 and 30367664)

      The presence of phase-separated stress granules in tissue is controversial since these are transient structures.

      Normally, stress granule proteins move to the cytoplasm under cellular stress, whereas SERBP1 becomes nuclear. The co-localization of abundant (and partially overexposed) cytoplasmic G3BP1 and SERBP1 under normal conditions is not evidence for association with stress granules. Does induction of stress granule formation lead to colocalization in stress granules? The H2O2 experiment suggests otherwise.

      RBPs implicated in stress response move to stress granules when cells are exposed to stress. SERBP1 has been shown to shuttle to stress granules and nucleus in stress conditions (PMID: 24205981). Our results are in agreement.

      Using co-IF, we observed some overlap between G3BP1 and SERBP1 in AD tissues. As shown in Fig. S6A and B, 50% of stress granules overlap with SERBP1 signal. On the contrary, it is hard to assess their relationship in aged-matched control brains where stress granules form and accumulate with a lower rate than in AD. SERBP1 is not very abundant in normal brains.  It is known that RNA-Binding Proteins aggregation and/or dysfunctional LLPS dysregulate stress granules formation and accumulation in AD and other proteinopathies (PMIDs 30853299, 27256390 and 31911437). However, it is too early to determine the role of SERBP1 and its contribution to stress granules formation and accumulation. We will examine this topic in future studies.

      There is a lack of essential demographics data (age, clinical diagnosis, path diagnosis, co-pathologies, Braak stage, etc.), source of the tissue (what brain bank?), brain regions shown, and whether there was informed consent for the collection and use of human brain tissue.

      We included the information requested in materials and methods section.

      Reviewer #3 (Recommendations For The Authors):

      The authors need to better explain their experimental rationale and approach in the main text, not just in the supplementary materials.

      We have extensively revised the text to provide a better description of experiments in the results section and figure legends.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In cells undergoing Flavivirus infection, cellular translation is impaired but the viruses themselves escape this inhibition and are efficiently translated. In this study, the authors use very elegant and direct approaches to identify the regions in the 5' and 3' UTRs that are important for this phenomenon and then use them to retrieve two cellular proteins that associate with them and mediate translational shutoff evasion (DDX3 and PABP1). A number of experimental approaches are used with a series of well-controlled experiments that fully support the authors' conclusions.

      Strengths:

      The work identifies the regions in the 5' and 3' UTRs of the viral genome that mediate the escape of JEV from cellular transcriptional shutoff, they evaluate the infectivity of the mutant viruses bearing or not these structures and even explore their pathogenicity in mice. They then identify the cellular proteins that bind to these regions (DDX3 and PABP1) and determine their role in translation blockade escape, in addition to examining and assessing the conservation of the stem-loop identified in JEV in other Flaviviridae.

      In almost all of their systematic analyses, translational effects are put in parallel with the replication kinetics of the different mutant viruses. The experimental thread followed in this study is rigorous and direct, and all experiments are truly well-controlled, fully supporting the authors' conclusions.

      We greatly appreciate the reviewer's recognition of this study. We elucidated the role of UTR in translation blockade escape of JEV from the perspective of the RNA structure of the UTR and its interaction with host proteins (DDX3 and PABP1), and we hope that this study could gain wider recognition.

      Reviewer #2 (Public review):

      Summary:

      The authors use a combination of techniques including viral genetics, in vitro reporters, and purified proteins and RNA to interrogate how the Japanese encephalitis virus maintains translation of its RNA to produce viral proteins after the host cell has shut down general translation as a means to block viral replication. They report a role for the RNA helicase DDX3 in promoting virus translation in a cap-independent manner through binding a dumbbell RNA structure in the 3' untranslated region previously reported to drive Japanese encephalitis virus cap-independent translation and a stem-loop at the viral RNA 5' end.

      Strengths:

      The authors clearly show that the Japanese encephalitis virus does not possess an IRES activity to initiate translation using a range of mono- and bi-cistronic mRNAs. Surprisingly, using a replicon system, the translation of a capped or uncapped viral RNA is reported to have the same translation efficiency when transfected into cells. The authors have applied a broad range of techniques to support their hypotheses.

      We are grateful for the reviewer’s recognition of the thoroughness and multi-faceted nature of our study.

      Weaknesses:

      (1) The authors' original experiments in Figure 1 where the virus is recovered following transfection of in vitro transcribed viral RNA with alternative 5' ends such as capped or uncapped ignore that after a single replication cycle of that transfected RNA, the subsequent viral RNA will be capped by the viral capping proteins making the RNA in all conditions the same.

      Thank you for your suggestion. We share the same viewpoint as the reviewer. After the first round of translation of the uncapped viral RNA, the subsequent viral RNA will inevitably be capped by the viral capping proteins. However, there is no doubt that the transfected cells do not contain viral capping proteins in the initial transfection stage, which directly proved that JEV possesses a cap-independent translation initiation mechanism.

      (2) The authors report that deletion of the dumbbell and the large 3' stem-loop RNA reduce replication of a Japanese encephalitis virus replicon. These structures have been reported for other flaviviruses to be important respectively for the accumulation of short flaviviral RNAs that can regulate replication and stability of the viral RNA that lacks a polyA tail. The authors don't show any assessment of RNA stability or degradation state.

      Thank you for your suggestion. We agree that a rigorous supplementary experiment for the assessment of RNA stability or degradation state is desirable. To address this, the relative amounts of viral RNA with the deletion of DB2 or sHP-SL will be determined by real-time RT-PCR analysis in transfected cells at multiple time points, which will allow us to test whether the deletion of the dumbbell and the large 3' stem-loop RNA reduce the RNA stability of JEV.

      (3) The authors propose a model for DDX3 to drive 5'-3' end interaction of the Japanese encephalitis virus viral genome but no direct evidence for this is presented.

      Thank you for your suggestion. In this study, we did not have direct evidence to suggest that DDX3 can drive the 5'-3' end interaction of the Japanese encephalitis virus viral genome, which is indeed a limitation of our research. In the revision, we will more explicitly discuss the interrelationship between DDX3 and 5'-3' UTR, as well as incorporate a discussion of these points into the main text, acknowledging the limitations of our current models.

      (4) The authors' final model in Figure 10 proposes a switch from a cap-dependent translation system in early infection to cap-independent DDX3-driven translation system late in infection. The replicon data that measures translation directly however shows identical traces for capped and uncapped RNAs in all untreated conditions so that which mechanism is used at different stages of the infection is not clear.

      Thank you for your suggestion. The replicon transfection system was used to evaluate the key viral element for cap-independent translation. We only monitored reporter gene expression from 2 hpt to 12 hpt, which can’t fully recapitulate the different stages of JEV infection. In the experimental results Figure 1 and Figure 1-figure supplement 1, we demonstrated that JEV significantly induced the host translational shutoff at 36 hpi, while the expression level of viral protein gradually increased as infection went on, suggesting that JEV translation could evade the shutoff of cap-dependent translation initiation at the late stage of infection. As shown in the growth curves in Figure 5Q, JEV replicated to similar virus titers in WT and DDX3-KO cells from 12 hpi to 36 hpi, but higher level virus yields were observed in WT cells from 48 hpi, suggesting that DDX3 is important for JEV infection at the late stage. DDX3 was demonstrated to be critical for JEV cap-independent translation. Based on these data, we proposed that the DDX3-dependent cap-independent translation is employed by JEV to maintain efficient infection at the late stage when the cap-dependent translation imitation was suppressed.

      Reviewer #3 (Public review):

      Summary:

      This work is a valuable study that aims to decipher the molecular mechanisms underlying the translation process in Japanese encephalitis virus (JEV), a relevant member of the genus Flavivirus. The authors provide evidence that cap-independent translation, which has already been demonstrated for other flaviviruses, could also account in JEV. This process depends on the genomic 3' UTR, as previously demonstrated in other flaviviruses. Further, the authors find that cellular proteins such as DDX3 or PABP1 could contribute to JEV translation in a cap-independent way. Both DDX3 and PABP1 had previously been described to have a role in cellular protein synthesis and also in the translation step of other flaviviruses distinct from JEV; therefore, this work would expand the cap-independent translation in flaviviruses as a general mechanism to bypass the translation repression exerted by the host cell during viral infection. Further, the findings can be relevant for the development of specific drugs that could interfere with flaviviral translation in the future. Nevertheless, the conclusions are not fully supported by the provided results.

      Strengths:

      The results provide a good starting point to investigate the molecular mechanism underlying the translation in flaviviruses, which even today is an area of knowledge with many limitations.

      Thank you to the reviewer for providing positive feedback. The research on the molecular mechanism underlying cap-independent translation is still a limited field in the flaviviruses, and its mechanism has not been well elucidated at present. We only hope that this study could reveal a novel mechanism of translation initiation for flaviviruses.

      Weaknesses:

      The main limit of the work is related to the fact that the role of the 3' UTR structural elements and DDX3 is not only circumscribed to translation, but also to replication and encapsidation. In fact, some of the provided results suggest this idea. Particularly, it is intriguing why the virus titer can be completely abrogated while the viral protein levels are only partially affected by the knockdown of DDX3. This points to the fact that many of the drawn conclusions could be overestimated or, at least, all the observed effect cannot be attributed only to the DDX3 effect on translation. Finally, it is noteworthy that the use of uncapped transcripts could be misleading, since this is not the natural molecular context of the viral genome.

      Thank you for your suggestion. We agree with the reviewer's comments that the role of the 3' UTR structural elements and DDX3 may not only be circumscribed to translation. However, not as described by the reviewer, DDX3 knockdown did not completely abrogate JEV infection. As indicated in Figure 5E-5F, the recombinant virus was successfully rescued at 36 hpt and 48 hpt using the uncapped viral genomic RNA, although the viral titer rescued with the uncapped genomic RNA at 24 hpt was below the limit of detection. We have confirmed that the DB2 and sHP-SL elements in 3' UTR play a decisive role in the replication of viral RNA in our research (Figure 2G and Figure 2-figure supplement 4C), and we will further analyze the role of DDX3 in viral RNA replication and encapsidation, thereby clarifying the multiple functions of DDX3 in JEV life cycle. Meanwhile, we will incorporate a discussion of these points into the main text, acknowledging the limitations of our current research.

      To eliminate the misleading effects of using uncapped transcripts, we will use a natural molecular background of the viral genome with cap methylation deficiency. The methyltransferase (MTase) of the flavivirus NS5 protein catalyzes  N-7 and 2’-O methylations in the formation of the 5’-end cap of the genome, and the E218 amino acid of the NS5 protein MTase domain is one of the active sites of flavivirus methyltransferase (PLoS Pathogens. 2012. PMID:22496660; Journal of Virology. 2007. PMID: 1866096). We will construct a mutant virus of the E218A mutation to abolish 2'-O methylation activity and significantly reduce N-7 methylation activity and then analyze the roles of UTR structure and DDX3 in recombinant viruses with the type-I cap structure functional deficiency.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors.

      Strengths:

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT.

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1_s as potential _FT regulators. We think that this finding attracts broader audiences, as the molecular factor that coordinates plant nutrition status with flowering time remains largely unknown despite its well-known plant phenomenon.

      Weaknesses:

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all.

      We agree with this comment, as we also did not intend to say that FT is not produced in other companion cells than the subpopulation we identified. We will revise the title to more accurately reflect the point.

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale of using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. It is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Usually, it requires a minimum of several hours of enzymatic incubation to protoplast companion cells and the efficiencies of protoplasting these cells are still low. For our analysis, restoring the time information within a day is also crucial. Therefore, we performed more speedy isolation method. In the revision, we will explain our rationale of choosing snRNA-seq due to the technical limitations.

      Here, reviewer 1 raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessaryily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality.

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed?

      As we previously showed only limited spatial images of overlap between FT-expressing cells and other cluster 7 gene-expressing cells in Fig. 4B, this comment is understandable. To respond to it, we will include whole leaf images of FT- and cluster 7 gene-expressing cells to assess the spatial overlaps between FT and cluster 7 genes within a leaf.

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions?

      To answer this question, we will include the flowering time measurement data of the nigtQ mutants grown on the soil with sufficient nitrogen sources.

      Reviewer #2 (Public review):

      This manuscript submitted by Takagi et al. details the molecular characterization of the FT-expressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4.

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time.

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FT-expressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript.

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section.

      We agree reviewer 2 that spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, NIGT1.2 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. Another possible explanation is that NIGT1.2 negatively affects the formation of cluster 7 cells. If so, cells with high NIGT1.2 gene expression hardly become cluster 7 cells. We will discuss it further in the discussion section in our revised manuscript.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting domain, GFP, and biotin acceptor peptide. It was originally designed for the INTACT (isolation of nuclei tagged in specific cell types) method that enables us to isolate bulk nuclei from specific tissues. Although our original intention was profiling the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we will include a schematic diagram of NTF.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We have carefully addressed all the reviewers' suggestions, and detailed responses are provided at the end of this letter. In summary:

      • We conducted two additional replicates of the study to obtain more robust and reliable data.

      • The Introduction has been revised for greater clarity and conciseness.

      • The Results section was shortened and reorganized to highlight the key findings more effectively.

      • The Discussion was modified according to the reviewers' suggestions, with a focus on reorganization and conciseness.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision: 

      Overall, I am not quite convinced about the possible shift in host use in the Argentinian populations of Cx. quinquefasciatus. The evidence from the papers that the authors cite is not strong enough to derive this conclusion. Therefore, I think that the introduction and discussion parts where they talk about host shift in Cx. quinquefasciatus should be removed completely as it misleads the readers. I suggest limiting the manuscript to talking only about the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quinquefasciatus

      As mentioned in the previous revision, we agree on the reviewer observation about the lack of evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from Argentina. We include this topic in the discussion.

      Additionally, we also added a paragraph in the discussion section to include the limitations of our study and conclusions. One of them is the fact that our results are based on controlled conditions experiments. Future studies are needed to elucidate if the same trend is found in the field.

      Reviewer #1 (Recommendations for the authors): 

      Abstract

      Line 73: shift in feeding behavior

      Accepted as suggested. 

      Discussion

      Line 258: addressed that Accepted as suggested.

      Line 263: blood is nutritionally richer

      Accepted as suggested.

      Reviewer #2 (Public Review): 

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer concerns, with several exception that continue to cause concern about the conclusions of the study. 

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field. 

      (3) The manuscript has become a lot clearer and easier to read with the revisions - thank you to the authors for working hard to make many of the suggested changes. 

      Weaknesses:

      (1) The authors have decided not to follow the suggestion of conducting experimental replicates of the study. This is understandable given the significant investment of resources and time necessary, however, it leaves the study lacking support. Experimental replication is an important feature of a strong study and helps to provide confidence that the observed patterns are real and replicable. Without replication, I continue to lack confidence in the conclusions of the study. 

      We included replicates as suggested.  

      (2) The authors have included some additional discussion about the counterintuitive nature of their results, but the paragraph discussing this in the discussion was confusing. I believe that this should be revised. This is a key point of the paper and needs to be clear to the reader.

      Revised as suggested. 

      (3) There should be more discussion of the host switching observed in the two studies conducted in Argentina referenced by the authors. Since host switching is the foundation for the hypothesis tested in this paper, it is important to fully explain what is currently known in Argentina. 

      Accepted as suggested.

      (4) In some cases, the explanations of referenced papers are not entirely accurate. For example, when referencing Erram et al 2022, I think the authors misrepresented the paper's discussion regarding pre-diuresis- Erram et al. are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility (rather than leading to higher fecundity on birds, as stated in this manuscript). The study performed by Erram et al. also didn't prove this phenomenon, they just suggest it as a possible mechanism to explain their results, so that should be made clear when referencing the paper. 

      Changed as suggested.

      (5) In some cases, the conclusions continue to be too strongly worded for the evidence available. For example, lines 322-324: I don't think the data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. 

      Redaction was modified as suggested to tight our discussion with results.

      (6) There is limited mention of the caveat that this experiment performed with simulated seasonality that does not perfectly replicate seasonality in the field. I think this caveat should be discussed in the discussion (e.g. that humidity is held constant).

      This topic is now included in the discussion as suggested. 

      Reviewer #2 (Recommendations for the authors): 

      59-60: These terms should end with -phagic instead of -philic. These papers study blood feeding patterns, not preference. I understand that the Janssen papers calls it "mammalophilic" in their title, but this was an incorrect use of the term in their paper. There are some review papers that explain the difference in this terminology if it's helpful.

      Accepted as suggested. 

      73: edit to "in" feeding behavior 

      Accepted as suggested.

      77-78: Given that the premise of your study is based on the phenomenon of host switching, I suggest that you expand your discussion of these two papers. What did they observe? Which hosts did they switch from / to and how dramatic was the shift?

      Accepted as suggested. 

      79: replace acknowledged with experienced 

      Accepted as suggested.

      79-80: the way that this is written is misleading. It suggests that Spinsanti showed that seasonal variation in SLEV could be attributed to a host shift, which isn't true. This citation should come before the comma and then you should use more cautious language in the second half. E.g which MIGHT be possible to attribute to .... 

      Accepted as suggested.

      80-82: this is not convincing. Even if the Robin isn't in Argentina, Argentina does have migrating birds, so couldn't this be the case for other species of birds? Do any of the birds observed in previous blood meal analyses in Argentina migrate? If so, couldn't this hypothesis indeed play a role? 

      A paragraph about this topic was added to the discussion as suggested.

      90: hypotheses for what? The fall peak in cases? Or host switching? 

      Changed to be clearer.

      98: where was this mentioned before? I think "as mentioned before" can be removed. 

      Accepted as suggested.

      101: edit to "whether an interaction effect exists" 

      Accepted as suggested.

      104: edit to "We hypothesize that..." 

      Accepted as suggested.

      106: reported host USE changes, not host PREFERENCE changes, right? 

      All the terminology was change to host pattern and not preference to avoid confusion.

      200: Briefly reading Carsey and Harden, it looks like the methodology was developed for social science. Is there anything you can cite to show this applied to other types of data? If not, I think this requires more explanation in your MS. 

      This was removed as replicates were included.

      237-239: I think it is best not to make a definitive statement about greater/higher if it isn't statistically significant; I suggest modifying the sentences to state that the differences you are listing were not significantly different up front rather than at the end, otherwise if people aren't reading carefully, they may get the wrong impression. 

      Accepted as suggested.

      245: you only use the term MS-I once before and I forgot what it meant since it wasn't repeated, so I had to search back through with command-F. I suggest writing this out rather than using the acronym. 

      Accepted as suggested.

      249: edit to: "an interaction exists between the effect of..." 

      Accepted as suggested.

      253-254: greater compared to what? 

      Change for clearness. 258-260: edit for grammar 

      Accepted as suggested.

      260-262: edit for grammar; e.g. "However, this assumption lacks solid evidence; there is a scarcity of studies regarding nutritional quality of avian blood and its impact on mosquito fitness." 

      Accepted as suggested.

      263: edit: blood is nutritionally... 

      Accepted as suggested.

      264-267: This doesn't sound like an accurate interpretation of what the paper suggests regarding pre-diuresis in their discussion - they are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility. They also don't show this, they just suggest it as a possible mechanism to explain their results. 

      This topic was removed given the restructuring of discussion.

      253-269: You should tie this paragraph back to your results to explicitly compare/contrast your findings with the previous literature. 

      Accepted as suggested.

      270-282: This paragraph would be a good place to explain the caveat of working in the laboratory - for example, humidity was the same across the two seasons which I'm guessing isn't the case in the field in Argentina. You can discuss what aspects of laboratory season simulation do not accurately replicate field conditions and how this can impact your findings. You said in your response to the reviewers that you weren't interested in measuring other variables (which is fair, and not expected!), but the beauty of the discussion section is to be able to think about how your experimental design might impact your results - one possibility is that your season simulation may not have produced the results produced by true seasonal shifts. 

      Accepted as suggested.

      279-281: You say your experiment was conducted within the optimal range, which would suggest that both summer and autumn were within that range, but then you only talk about summer as optimal in the following sentence. 

      Changed for clearness.

      281-282: You should clarify this sentence - state what the interaction has an effect on. 

      Accepted as suggested.

      283-291: I appreciate that your discussion now acknowledges the small sample size and the questions that remain unanswered due to the results being opposite to that of the hypothesis, but this paragraph lacks some details and in places doesn't make sense. 

      I think you need to emphasize which groups had small sample size and which conclusions that might impact. I also think you need to explain why the sample size was substantially smaller for some groups (e.g. did they refuse to feed on the mouse in the autumn?). I appreciate that sample sizes are hard to keep high across many groups and two gonotrophic periods, but unfortunately, that is why fitness experiments are so hard to do and by their nature, take a long time. I understand that other papers have even lower sample size, but I was not asked to review those papers and would have had the same critique of them. I don't believe that creating simulated data via a Monte Carlo approach can make up for generating real data. As I understand it from your explanation, you are parametrizing the Monte Carlo simulations with your original data, which was small to begin with for autumn mouse. Using this simulation doesn't seem like a satisfactory replacement for an experimental replicate in my opinion. I maintain that at least a second replicate is necessary to see whether the patterns that you have observed hold. 

      The performing of a power analysis and addition of more replicates tried to solve the issue of sample size. More about this critic is added in the discussion. The simulation approach was totally removed.

      Regarding the directionality of the interaction effect, I think this warrants more discussion. Lines 287-291 don't make sense to me. You suggest that feeding on birds in the autumn may confer a reproductive advantage when conditions are more challenging. But then why wouldn't they preferentially feed on birds in the autumn, rather than mammals? I suggest rewriting this paragraph to make it clearer. 

      Accepted as suggested.

      297: earlier mentioned treatments? Do you mean compared to the first gonotrophic cycle? This isn't clear. 

      Changed for clearness.

      302-303: Did you clarify whether you are allowed to reference unpublished data in eLife? 

      This was removed to follow the guidelines of eLife.

      316-317: "it becomes apparent" sounds awkward, I suggest rewording and also explaining how this conclusion was made. 

      Accepted as suggested.

      322-324: I think that this statement is too strongly worded. I don't think your data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. Please modify this and make your conclusions more cautious and closely linked to what you actually demonstrated. 

      Accepted as suggested.

      325: change will perform to would have 

      Accepted as suggested.

      326: add to the sentence: "and vice versa in the summer" 

      Accepted as suggested.

      330: possible explanations, not explaining scenarios. 

      Accepted as suggested.

      517: I think you should repeat the abbreviation definitions in the caption to make it easier for readers, otherwise they have to flip back and forth which can be difficult depending on formatting.

      Accepted as suggested. 

      In general, I think that your captions need more information. I think the best captions explain the figure relatively thoroughly such that the reader can look at the figure and caption and understand without reading the paper in depth. (e.g. the statistical test used).

      Data availability: The eLife author instructions do say that data must be made available, so there should be a statement on data availability in your MS. I also suggest you make the code available.

      Accepted as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this paper is to describe a novel method for genetic labelling of animals or cell populations, using a system of DNA/RNA barcodes.

      Strengths:

      • The author's attempt at providing a straightforward method for multiplexing Drosophila samples prior to scRNA-seq is commendable. The perspective of being able to load multiple samples on a 10X Chromium without antibody labelling is appealing.

      • The authors are generally honest about potential issues in their method, and areas that would benefit from future improvement.

      • The article reads well. Graphs and figures are clear and easy to understand.

      We thank the reviewer for these positive comments.

      Weaknesses:

      • The usefulness of TaG-EM for phototaxis, egg laying or fecundity experiments is questionable. The behaviours presented here are all easily quantifiable, either manually or using automated image-based quantification, even when they include a relatively large number of groups and replicates. Despite their claims (e.g., L311-313), the authors do not present any real evidence about the cost- or time-effectiveness of their method in comparison to existing quantification methods.

      While the behaviors that were quantified in the original manuscript were indeed relatively easy to quantify through other methods, they nonetheless demonstrated that sequencing-based TaG-EM measurements faithfully recapitulated manual behavioral measurements. In response to the reviewer’s comment, we have added additional experiments that demonstrate the utility of TaG-EM-based behavioral quantification in the context of a more labor-intensive phenotypic assay (measuring gut motility via food transit times in Drosophila larvae, Figure 4, Supplemental Figure 7). We found that food transit times in the presence and absence of caffeine are subtly different and that, as with larger effect size behaviors, TaG-EM data recapitulates the results of the manual assay. This experiment demonstrates both that TaG-EM can be used to streamline labor-intensive behavioral assays (we have included an estimate of the savings in hands-on labor for this assay by using a multiplexed sequencing approach, Supplemental Figure 8) and that TaG-EM can quantify small differences between experimental groups. We also note in the discussion that an additional benefit of TaGEM-based behavioral assays is that the observed is blinded as to the experimental conditions as they are intermingled in a single multiplexed assay. We have added the following text to the paper describing these experiments.

      Results:

      “Quantifying food transit time in the larval gut using TaG-EM

      Gut motility defects underlie a number of functional gastrointestinal disorders in humans (Keller et al., 2018). To study gut motility in Drosophila, we have developed an assay based on the time it takes a food bolus to transit the larval gut (Figure 4A), similar to approaches that have been employed for studying the role of the microbiome in human gut motility (Asnicar et al., 2021). Third instar larvae were starved for 90 minutes and then fed food containing a blue dye. After 60 minutes, larvae in which a blue bolus of food was visible were transferred to plates containing non-dyed food, and food transit (indicated by loss of the blue food bolus) was scored every 30 minutes for five hours (Supplemental Figure 7). 

      Because this assay is highly labor-intensive and requires hands-on effort for the entire five-hour observation period, there is a limit on how many conditions or replicates can be scored in one session (~8 plates maximum). Thus, we decided to test whether food transit could be quantified in a more streamlined and scalable fashion by using TaG-EM (Figure 4B). Using the manual assay, we observed that while caffeinecontaining food is aversive to larvae, the presence of caffeine reduces transit time through the gut (Figure 4C, Supplemental Figure 7). This is consistent with previous observations in adult flies that bitter compounds (including caffeine) activate enteric neurons via serotonin-mediated signaling and promote gut motility (Yao and Scott, 2022). We tested whether TaG-EM could be used to measure the effect of caffeine on food transit time in larvae. As with prior behavioral tests, the TaG-EM data recapitulated the results seen in the manual assay (Figure 4D). Conducting the transit assay via TaGEM enables several labor-saving steps. First, rather than counting the number of larvae with and without a food bolus at each time point, one simply needs to transfer nonbolus-containing larvae to a collection tube. Second, because the TaG-EM lines are genetically barcoded, all the conditions can be tested at once on a single plate, removing the need to separately count each replicate of each experimental condition. This reduces the hands-on time for the assay to just a few minutes per hour.  A summary of the anticipated cost and labor savings for the TaG-EM-based food transit assay is shown in Supplemental Figure 8.”

      Discussion:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8). An additional benefit of multiplexed TaG-EM behavioral measurements is that the experimental conditions are effectively blinded as the multiplexed conditions are intermingled in a single assay.”

      Methods:

      “Larval gut motility experiments

      Preparing Yeast Food Plates

      Yeast agar plates were prepared by making a solution containing 20% Red Star Active Dry Yeast 32oz (Red Star Yeast) and 2.4% Agar Powder/Flakes (Fisher) and a separate solution containing 20% Glucose (Sigma-Aldrich). Both mixtures were autoclaved with a 45-minute liquid cycle and then transferred to a water bath at 55ºC. After cooling to 55ºC, the solutions were combined and mixed, and approximately 5 mL of the combined solution was transferred into 100 x 15 mm petri dishes (VWR) in a PCR hood or contamination-free area. For blue-dyed yeast food plates, 0.4% Blue Food Color (McCormick) was added to the yeast solution. For the caffeine assays, 300 µL of a solution of 100 mM 99% pure caffeine (Sigma-Aldrich) was pipetted onto the blue-dyed yeast plate and allowed to absorb into the food during the 90-minute starvation period.

      Manual Gut Motility Assay

      Third instar Drosophila larvae were transferred to empty conical tubes that had been misted with water to prevent the larvae from drying out. After a 90-minute starvation period the larvae were moved from the conical to a blue-dyed yeast plate with or without caffeine and allowed to feed for 60 minutes. Following the feeding period, the larvae were transferred to an undyed yeast plate. Larvae were scored for the presence or absence of a food bolus every 30 minutes over a 5-hour period. Up to 8 experimental replicates/conditions were scored simultaneously. 

      TaG-EM Gut Motility Assay

      Third instar larvae were starved and fed blue dye-containing food with or without caffeine as described above. An equal number of larvae from each experimental condition/replicate were transferred to an undyed yeast plate. During the 5-hour observation period, larvae were examined every 30 minutes and larvae lacking a food bolus were transferred to a microcentrifuge tube labeled for the timepoint. Any larvae that died during the experiment were placed in a separate microcentrifuge tube and any larvae that failed to pass the food bolus were transferred to a microcentrifuge tube at the end of the experiment. DNA was extracted from the larvae in each tube and TaG-EM barcode libraries were prepared and sequenced as described above.”

      • Behavioural assays presented in this article have clear outcomes, with large effect sizes, and therefore do not really challenge the efficiency of TaG-EM. By showing a Tmaze in Fig 1B, the authors suggest that their method could be used to quantify more complex behaviours. Not exploring this possibility in this manuscript seems like a missed opportunity.

      See the response to the previous point.

      • Experiments in Figs S3 and S6 suggest that some tags have a detrimental effect on certain behaviours or on GFP expression. Whereas the authors rightly acknowledge these issues, they do not investigate their causes. Unfortunately, this question the overall suitability of TaG-EM, as other barcodes may also affect certain aspects of the animal's physiology or behaviour. Revising barcode design will be crucial to make sure that sequences with potential regulatory function are excluded.

      We have determined that the barcode (BC#8) that had no detectable Gal4induced gene expression in Figure S6 (now Supplemental Figure 9) has a deletion in the GFP coding region that ablates GFP function. Interestingly, the expressed TaG-EM barcode transcript is still detectable in single cell sequencing experiments, but obviously this line cannot be used for cell enrichment (at least based solely on GFP expression from the TaG-EM construct). While it is unclear how this line came to have a lesion in the GFP gene, we have subsequently generated >150 additional TaG-EM stocks and we have tested the GFP expression of these newly established stocks by crossing them to Mhc-Gal4. All of the additional stocks had GFP expression in the expected pattern, indicating that the BC#8 construct is an outlier with respect to inducibility of GFP. We have added the following text to the results section to address this point:

      “No GFP expression was visible for TaG-EM barcode number 8, which upon molecular characterization had an 853 bp deletion within the GFP coding region (data not shown). We generated and tested GFP expression of an additional 156 TaG-EM barcode lines (Alegria et al., 2024), by crossing them to Mhc-Gal4 and observing expression in the adult thorax. All 156 additional TaG-EM lines had robust GFP expression (data not shown).”

      It is certainly the case that future improvements to the construct design may be necessary or desirable and that back-crossing could likely be used to alleviate line-toline differences for specific phenotypes, we also address this point in the discussion with the following text:

      “We excluded this poor performing barcode line from the fecundity tests, however, backcrossing is often used to bring reagents into a consistent genetic background for behavioral experiments and could also potentially be used to address behavior-specific issues with specific TaG-EM lines. In addition, other strategies such as averaging across multiple barcode lines or permutation of barcode assignment across replicates could also mitigate such deficiencies.”

      • For their single-cell experiments, the authors have used the 10X Genomics method, which relies on sequencing just a short segment of each transcript (usually 50-250bp - unknown for this study as read length information was not provided) to enable its identification, with the matching paired-end read providing cell barcode and UMI information (Macosko et al., 2015). With average fragment length after tagmentation usually ranging from 300-700bp, a large number of GFP reads will likely not include the 14bp TaG-EM barcode. 

      The 10x Genomics 3’ workflows that were used for sequencing TaG-EM samples reads the cell barcode and UMI in read one and the expressed RNA sequence in read two. We sequenced the samples shown in Figure 5 in the initial manuscript using a run configuration that generated 150 bp for read two. The TaG-EM barcodes are located just upstream of the poly-adenylation sites (based on the sequencing data, we observe two different poly-A sites and the TaG-EM barcode is located 35 and 60 bp upstream of these sites). Based on the location of the TaG-EM barcodes,150 bp reads is sufficient to see the barcode in any GFP-associated read (when using the 3’ gene expression workflow). In addition to detecting the expression of the TaG-EM barcodes in the 10x Genomics gene expression library, it is possible to make a separate library that enriches the barcode sequence (similar to hashtag or CITE-Seq feature barcode libraries). We have added experimental data where we successfully performed an enrichment of the TaG-EM barcodes and sequenced this as a separate hashtag library (Supplemental Figure 18). We have added text to the results describing this work and also included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. 

      Results:

      “In antibody-conjugated oligo cell hashing approaches, sparsity of barcode representation is overcome by spiking in an additional primer at the cDNA amplification step and amplifying the hashtag oligo by PCR. We employed a similar approach to attempt to enrich for TaG-EM barcodes in an additional library sequenced separately from the 10x Genomics gene expression library. Our initial attempts at barcode enrichment using spike-in and enrichment primers corresponding to the TaG-EM PCR handle were unsuccessful (Supplemental Figure 18). However, we subsequently optimized the TaG-EM barcode enrichment by 1) using a longer spike-in primer that more closely matches the annealing temperature used during the 10x Genomics cDNA creation step, and 2) using a nested PCR approach to amplify the cell-barcode and unique molecular identifier (UMI)-labeled TaG-EM barcodes (Supplemental Figure 18). Using the enriched library, TaG-EM barcodes were detected in nearly 100% of the cells at high sequencing depths (Supplemental Figure 19). However, although we used a polymerase that has been engineered to have high processivity and that has been shown to reduce the formation of chimeric reads in other contexts (Gohl et al., 2016), it is possible that PCR chimeras could lead to unreliable detection events for some cells. Indeed, many cells had a mixture of barcodes detected with low counts and single or low numbers of associated UMIS. To assess the reliability of detection, we analyzed the correlation between barcodes detected in the gene expression library and the enriched TaG-EM barcode library as a function of the purity of TaG-EM barcode detection for each cell (the percentage of the most abundant detected TaG-EM barcode, Supplemental Figure 19). For TaG-EM barcode detections where the most abundance barcode was a high percentage of the total barcode reads detected (~75%-99.99%), there was a high correlation between the barcode detected in the gene expression library and the enriched TaG-EM barcode library. Below this threshold, the correlation was substantially reduced. 

      In the enriched library, we identified 26.8% of cells with a TaG-EM barcode reliably detected, a very modest improvement over the gene expression library alone (23.96%), indicating that at least for this experiment, the main constraint is sufficient expression of the TaG-EM barcode and not detection. To identify TaG-EM barcodes in the combined data set, we counted a positive detection as any barcode either identified in the gene expression library or any barcode identified in the enriched library with a purity of >75%. In the case of conflicting barcode calls, we assigned the barcode that was detected directly in the gene expression library. This increased the total fraction of cells where a barcode was identified to approximately 37% (Figure 6B).”

      Methods:

      “The resulting pool was prepared for sequencing following the 10x Genomics Single Cell 3’ protocol (version CG000315 Rev C), At step 2.2 of the protocol, cDNA amplification, 1 µl of TaG-EM spike-in primer (10 µM) was added to the reaction to amplify cDNA with the TaG-EM barcode. Gene expression cDNA and TaG-EM cDNA were separated using a double-sided SPRIselect (Beckman Coulter) bead clean up following 10x Genomics Single Cell 3’ Feature Barcode protocol, step 2.3 (version CG000317 Rev E). The gene expression cDNA was created into a library following the CG000315 Rev C protocol starting at section 3. Custom nested primers were used for enrichment of TaG-EM barcodes after cDNA creation using PCR.  The following primers were tested (see Supplemental Figure 18):

      UMGC_IL_TaGEM_SpikeIn_v1:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCTTCCAACAACCGGAAGT*G*A UMGC_IL_TaGEM_SpikeIn_v2:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A

      UMGC_IL_TaGEM_SpikeIn_v3:

      TGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A D701_TaGEM:

      CAAGCAGAAGACGGCATACGAGATCGAGTAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGC*T*T

      SI PCR Primer:

      AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC*T*C

      UMGC_IL_DoubleNest:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGG*A*A

      P5: AATGATACGGCGACCACCGA

      D701:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATGCCGTCTTCTGCTTG

      D702:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAATCTCGTATGCCGTCTTCTGCTTG

      After multiple optimization trials, the following steps yielded ~96% on-target reads for the TaG-EM library (Supplemental Figure 18, note that for the enriched barcode data shown in Figure 6 and Supplemental Figure 19, a similar amplification protocol was used TaG-EM barcodes were amplified from the gene expression library cDNA and not the SPRI-selected barcode pool). TaG-EM cDNA was amplified with the following PCR reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl UMGC_IL_DoubleNest primer (10 µM), 2.5 µl SI_PCR primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 15 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the first PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40 µL of nuclease-water. A second round of PCR was run with following reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl D702 primer (10 µM), 2.5 µl p5 Primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 10 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the second PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40uL of nuclease-water. The resulting 3’ gene expression library and TaG-EM enrichment library were sequenced together following Scenario 1 of the BioLegend “Total-Seq-A Antibodies and Cell Hashing with 10x Single Cell 3’ Reagents Kit v3 or v3.1” protocol. Additional sequencing of the enriched TaG-EM library also done following Scenario 2 from the same protocol.” 

      When a given cell barcode is not associated with any TaG-EM barcode, then demultiplexing is impossible. This is a major problem, which is particularly visible in Figs 5 and S13. In 5F, BC4 is only detected in a couple of dozen cells, even though the Jon99Ciii marker of enterocytes is present in a much larger population (Fig 5C). Therefore, in this particular case, TaG-EM fails to detect most of the GFP-expressing cells. 

      Figure 5 in the original manuscript represented data from an experiment in which there were eight different TaG-EM barcoded samples present, including four replicates of the pan-midgut driver (each of which included enterocyte populations). One would not expect the BC4 enterocyte driver expression to be observed in all of the Jon99Ciii cells, since the majority of the GFP+ cells shown in the UMAP plot were likely derived from and are labeled by the pan-midgut driver-associated barcodes. Thus, the design and presentation of this particular experiment (in particular, the presence of eight distinct samples in the data set) is making the detection of the TaG-EM barcodes look sparser than it actually is. We have added a panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, for this experiment.

      However, the reviewer’s overall point regarding barcode detection is still valid in that if we consider all eight barcodes, we only see TaG-EM barcode labeling associated with about a quarter of all the cells in this gene expression library, or about 37% of cells when we include the enriched TaG-EM barcode library. While improving barcode detection will improve the yield and is necessary for some applications (such as robust detection of multiplets), we would argue that even at the current level of success this approach has significant utility. First, if one’s goal is to unambiguously label a cell cluster and trace it to a defined cell population in vivo, sparse labeling may be sufficient. Second, demultiplexing is still possible (as we demonstrate) but involves a trade off in yield (not every cell is recovered and there is some extra sequencing cost as some sequenced cells cannot be assigned to a barcode). 

      Similarly, in S13, most cells should express one of the four barcodes, however many of them (maybe up to half - this should be quantified) do not. Therefore, the claim (L277278) that "the pan-midgut driver were broadly distributed across the cell clusters" is misleading. Moreover, the hypothesis that "low expressing driver lines may result in particularly sparse labelling" (L331-333) is at least partially wrong, as Fig S13 shows that the same Gal4 driver can lead to very different levels of barcode coverage.

      As described above, since this experiment included eight different TaG-EM barcodes expressed by five different drivers, the expectation is that only about half of the cells in Figure S13 (now Figure S20) should express a TaG-EM barcode. It is not clear why BC2 is underrepresented in terms of the number of cells labeled and BC7 is overrepresented. We agree with the reviewer that this should be described more accurately in the paper and that it does impact our interpretation related to driver strength and barcode detection. We have revised this sentence in the discussion and also added additional text in the results describing the within driver variability seen in this experiment.

      Results text:

      “As expected, the barcodes expressed by the pan-midgut driver were broadly distributed across the cell clusters (Supplemental Figure 20). However, the number of cells recovered varied significantly among the four pan-midgut driver associated barcodes.”

      Discussion text:

      “It is likely that the strength of the Gal4 driver contributes to the labeling density. However, we also observed variable recovery of TaG-EM barcodes that were all driven by the same pan-midgut Gal4 driver (Supplemental Figure 20).”

      • Comparisons between TaG-EM and other, simpler methods for labelling individual cell populations are missing. For example, how would TaG-EM compare with expression of different fluorescent reporters, or a strategy based on the brainbow/flybow principle?

      The advantage of TaG-EM is that an arbitrarily large number of DNA barcodes can be used (contingent upon the availability of transgenic lines – we described 20 barcoded lines in our initial manuscript and we have now extended this collection to over 170 lines), while the number of distinguishable FPs is much lower. Brainbow/Flybow uses combinatorial expression of different FPs, but because this combinatorial expression is stochastic, tracing a single cell transcriptome to a defined cell population in vivo based on the FP signature of a Brainbow animal would likely not be possible (and would almost certainly be impossible at scale).

      • FACS data is missing throughout the paper. The authors should include data from their comparative flow cytometry experiment of TaG-EM cells with or without additional hexameric GFP, as well as FSC/SSC and fluorescence scatter plots for the FACS steps that they performed prior to scRNA-seq, at least in supplementary figures.

      We have added Supplemental Figures with the FACS data for all of the single cell sequencing data presented in the manuscript (Supplemental Figures 12 and 14).

      • The authors should show the whole data described in L229, including the cluster that they chose to delete. At least, they should provide more information about how many cells were removed. In any case, the fact that their data still contains a large number of debris and dead cells despite sorting out PI negative cells with FACS and filtering low abundance barcodes with Cellranger is concerning.

      This description was referring to the unprocessed Cellranger output (not filtered for low abundance barcodes). Prior to filtering for cell barcodes with high mitochondria or rRNA (or other processing in Seurat/Scanpy), we saw two clusters, one with low UMI counts and enrichment of mitochondrial genes (see Cellranger report below). 

      Author response image 1.

      These cell barcodes were removed by downstream quality filtering and the remaining cells showed expression of expected intestinal stem cell and enteroblast marker genes.

      Overall, although a method for genetic tagging cell populations prior to multiplexing in single-cell experiments would be extremely useful, the method presented here is inadequate. However, despite all the weaknesses listed above, the idea of barcodes expressed specifically in cells of interest deserves more consideration. If the authors manage to improve their design to resolve the major issues and demonstrate the benefits of their method more clearly, then TaG-EM could become an interesting option for certain applications.

      We thank the reviewer for this comment and hope that the above responses and additional experiments and data that we have added have helped to alleviate the noted weaknesses.

      Reviewer #2 (Public Review):

      In this manuscript, Mendana et al developed a multiplexing method - Targeted Genetically-Encoded Multiplexing or TaG-EM - by inserting a DNA barcode upstream of the polyadenylation site in a Gal4-inducible UAS-GFP construct. This Multiplexing method can be used for population-scale behavioral measurements or can potentially be used in single-cell sequencing experiments to pool flies from different populations. The authors created 20 distinctly barcoded fly lines. First, TaG-EM was used to measure phototaxis and oviposition behaviors. Then, TaG-EM was applied to the fly gut cell types to demonstrate its applications in single-cell RNA-seq for cell type annotation and cell origin retrieving.

      This TaG-EM system can be useful for multiplexed behavioral studies from nextgeneration sequencing (NGS) of pooled samples and for Transcriptomic Studies. I don't have major concerns for the first application, but I think the scRNA-seq part has several major issues and needs to be further optimized.

      Major concerns:

      (1) It seems the barcode detection rate is low according to Fig S9 and Fig 5F, J and N. Could the authors evaluate the detection rate? If the detection rate is too low, it can cause problems when it is used to decode cell types.

      See responses to Reviewer #1 on this topic above.  

      (2) Unsuccessful amplification of TaG-EM barcodes: The authors attempted to amplify the TaG-EM barcodes in parallel to the gene expression library preparation but encountered difficulties, as the resulting sequencing reads were predominantly offtarget. This unsuccessful amplification raises concerns about the reliability and feasibility of this amplification approach, which could affect the detection and analysis of the TaG-EM barcodes in future experiments.

      As noted above, we have now established a successful amplification protocol for the TaG-EM barcodes. This data is shown in Figure 6, and Supplemental Figures 18-19 and we have included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. We have also included code in the paper’s Github repository for assigning TaG-EM barcodes from the enriched library to the associated 10x Genomics cell barcodes.

      (3) For Fig 5, the singe-cell clusters are not annotated. It is not clear what cell types are corresponding to which clusters. So, it is difficult to evaluate the accuracy of the assignment of barcodes.

      We have added annotation information for the cell clusters based on expression of cell-type-specific marker genes (Figure 6A, Supplemental Figures 16-17).

      (4) The scRNA-seq UMAP in Fig 5 is a bit strange to me. The fly gut epithelium contains only a few major cell types, including ISC, EB, EC, and EE. However, the authors showed 38 clusters in fig 5B. It is true that some cell types, like EE (Guo et al., 2019, Cell Reports), have sub-populations, but I don't expect they will form these many subtypes. There are many peripheral small clusters that are not shown in other gut scRNAseq studies (Hung et al., 2020; Li et al., 2022 Fly Cell Atlas; Lu et al., 2023 Aging Fly Cell Atlas). I suggest the authors try different data-processing methods to validate their clustering result.

      For all of the single cell experiments, after doublet and ambient RNA removal (as suggested below), we have reclustered the datasets and evaluated different resolutions using Clustree. As the Reviewer points out, there are different EE subtypes, as well as regionalized expression differences in EC and other cell populations, so more than four clusters are expected (an analysis of the adult midgut identified 22 distinct cell types). With this revised analysis our results more closely match the cell populations observed in other studies (though it should be noted that the referenced studies largely focus on the adult and not the larval stage).  

      (5) Different gut drivers, PMC-, PC-, EB-, EC-, and EE-GAL4, were used. The authors should carefully characterize these GAL4 expression in larval guts and validate sequencing data. For example, does the ratio of each cell type in Fig 5B reflect the in vivo cell type ratio? The authors used cell-type markers mostly based on the knowledge from adult guts, but there are significant morphological and cell ratio differences between larval and adult guts (e.g., Mathur...Ohlstein, 2010 Science).

      We have characterized the PC driver which is highlighted in Supplemental Figure 13, and the EC and EE drivers which are highlighted in Figure 6G-N in detail in larval guts and have added this data to the paper (Supplemental Figure 21). The EB driver was not characterized histologically as EB-specific antibodies are not currently available. The PMG-Gal4 line exhibits strong expression throughout the larval gut (Figure 5B and barcodes are recovered from essentially all of the larval gut cell clusters using this driver (Supplemental Figure 20). We don’t necessarily expect the ratios of cells observed in the scRNA-Seq data to reflect the ratios typically observed in the gut as we performed pooled flow sorting on a multiplexed set of eight genotypes and driver expression levels, flow sorting, and possibly other processing steps could all influence the relative abundance of different cell types. However, detailed characterization of these driver lines did reveal spatial expression patterns that help explain aspects of the scRNA-Seq data. We have also added the following text to the paper to further describe the characterization of the drivers:

      Results:

      “Detailed characterization of the EC-Gal4 line indicated that although this line labeled a high percentage of enterocytes, expression was restricted to an area at the anterior and middle of the midgut, with gaps between these regions and at the posterior (Supplemental Figure 21). This could explain the absence of subsets of enterocytes, such as those labeled by betaTry, which exhibits regional expression in R2 of the adult midgut (Buchon et al., 2013).”

      “Detailed characterization of the EE-Gal4 driver line indicated that ~80-85% of Prospero-positive enteroendocrine cells are labeled in the anterior and middle of the larval midgut, with a lower percentage (~65%) of Prospero-positive cells labeled in the posterior midgut (Supplemental Figure 21). As with the enterocyte labeling, and consistent with the Gal4 driver expression pattern, the EE-Gal4 expressed TaG-EM barcode 9 did not label all classes of enteroendocrine cells and other clusters of presumptive enteroendocrine cells expressing other neuropeptides such as Orcokinin, AstA, and AstC, or neuropeptide receptors such as CCHa2 (not shown) were also observed.”

      Methods:

      “Dissection and immunostaining

      Midguts from third instar larvae of driver lines crossed to UAS-GFP.nls or UAS-mCherry were dissected in 1xPBS and fixed with 4% paraformaldehyde (PFA) overnight at 4ºC. Fixed samples were washed with 0.1% PBTx (1xPBS + 0.1% Triton X-100) three times for 10 minutes each and blocked in PBTxGS (0.1% PBTx + 3% Normal Goat Serum) for 2–4 hours at RT. After blocking, midguts were incubated in primary antibody solution overnight at 4ºC. The next day samples were washed with 0.1% PBTx three times for 20 minutes each and were incubated in secondary antibody solution for 2–3 hours at RT (protected from light) followed by three washes with 0.1% PBTx for 20 minutes each. One µg/ml DAPI solution prepared in 0.1% PBTx was added to the sample and incubated for 10 minutes followed by washing with 0.1% PBTx three times for 10 minutes each. Finally, samples were mounted on a slide glass with 70% glycerol and imaged using a Nikon AX R confocal microscope. Confocal images were processed using Fiji software. 

      The primary antibodies used were rabbit anti-GFP (A6455,1:1000 Invitrogen), mouse anti-mCherry (3A11, 1:20 DSHB), mouse anti-Prospero (MR1A, 1:50 DSHB) and mouse anti-Pdm1 (Nub 2D4, 1:30 DSHB). The secondary antibodies used were goat antimouse and goat anti-rabbit IgG conjugated to Alexa 647 and Alexa 488 (1:200) (Invitrogen), respectively. Five larval gut specimens per Gal4 line were dissected and examined.”

      (6) Doublets are removed based on the co-expression of two barcodes in Fig 5A. However, there are also other possible doublets, for example, from the same barcode cells or when one cell doesn't have detectable barcode. Did the authors try other computational approaches to remove doublets, like DoubleFinder (McGinnis et al., 2019) and Scrublet (Wolock et al., 2019)?

      We have included DoubleFinder-based doublet removal in our data analysis pipeline. This is now described in the methods (see below).

      (7) Did the authors remove ambient RNA which is a common issue for scRNA-seq experiments?

      We have also used DecontX to remove ambient RNA. This is now described in the methods:

      “Datasets were first mapped and analyzed using the Cell Ranger analysis pipeline (10x Genomics). A custom Drosophila genome reference was made by combining the BDGP.28 reference genome assembly and Ensembl gene annotations. Custom gene definitions for each of the TaG-EM barcodes were added to the fasta genome file and .gtf gene annotation file. A Cell Ranger reference package was generated with the Cell Ranger mkref command. Subsequent single-cell data analysis was performed using the R package Seurat (Satija et al., 2015). Cells expressing less than 200 genes and genes expressed in fewer than three cells were filtered from the expression matrix. Next, percent mitochondrial reads, percent ribosomal reads cells counts, and cell features were graphed to determine optimal filtering parameters. DecontX (Yang et al., 2020) was used to identify empty droplets, to evaluate ambient RNA contamination, and to remove empty cells and cells with high ambient RNA expression. DoubletFinder (McGinnis et al., 2019) to identify droplet multiplets and remove cells classified as multiplets. Clustree (Zappia and Oshlack, 2018) was used to visualize different clustering resolutions and to determine the optimal clustering resolution for downstream analysis. Finally, SingleR (Aran et al., 2019) was used for automated cell annotation with a gut single-cell reference from the Fly Cell Atlas (Li et al., 2022). The dataset was manually annotated using the expression patterns of marker genes known to be associated with cell types of interest. To correlate TaG-EM barcodes with cell IDs in the enriched TaG-EM barcode library, a custom Python script was used (TaGEM_barcode_Cell_barcode_correlation.py), which is available via Github: https://github.com/darylgohl/TaG-EM.”

      (8) Why does TaG-EM barcode #4, driven by EC-GAL4, not label other classes of enterocyte cells such as betaTry+ positive ECs (Figures 5D-E)? similarly, why does TaG-EM barcode #9, driven by EE-GAL4, not label all EEs? Again, it is difficult to evaluate this part without proper data processing and accurate cell type annotation.

      As noted in the response to a comment by Reviewer #1 above, part of this apparent sparsity of labeling is due to the way that this experiment was designed and visualized. We have added a new Figure panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, to better illustrate the efficacy of barcode detection. See also the response to point 5 above. Both the lack of labelling of betaTry+ ECs and subsets of EEs is consistent with the expression patterns of the EC-Gal4 and EE-Gal4 drivers.

      (9) For Figure 2, when the authors tested different combinations of groups with various numbers of barcodes. They found remarkable consistency for the even groups. Once the numbers start to increase to 64, barcode abundance becomes highly variable (range of 12-18% for both male and female). I think this would be problematic because the differences seen in two groups for example may be due to the barcode selection rather than an actual biologically meaningful difference.

      While there is some barcode-to-barcode variability for different amplification conditions, the magnitude of this variation is relatively consistent across the conditions tested. We looked at the coefficient of variation for the evenly pooled barcodes or for the staggered barcodes pooled at different relative levels. While the absolute magnitude of the variation is higher for the highly abundant barcodes in the staggered conditions, the CVs for these conditions (0.186 for female flies and for 0.163 male flies) were only slightly above the mean CV (0.125) for all conditions (see Supplemental Figure 3):

      We have added this analysis as Supplemental Figure 3 and added the following text to the paper:(

      “The coefficients of variation were largely consistent for groups of TaG-EM barcodes pooled evenly or at different levels within the staggered pools (Supplemental Figure 3).”

      (10) Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to backcross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 above on this topic.

      Reviewer #3 (Public Review):

      The work addresses challenges in linking anatomical information to transcriptomic data in single-cell sequencing. It proposes a method called Targeted Genetically-Encoded Multiplexing (TaG-EM), which uses genetic barcoding in Drosophila to label specific cell populations in vivo. By inserting a DNA barcode near the polyadenylation site in a UASGFP construct, cells of interest can be identified during single-cell sequencing. TaG-EM enables various applications, including cell type identification, multiplet droplet detection, and barcoding experimental parameters. The study demonstrates that TaGEM barcodes can be decoded using next-generation sequencing for large-scale behavioral measurements. Overall, the results are solid in supporting the claims and will be useful for a broader fly community. I have only a few comments below:

      We thank the reviewer for these positive comments.

      Specific comments:

      (1) The authors mentioned that the results of structure pool tests in Fig. 2 showed a high level of quantitative accuracy in detecting the TaG-EM barcode abundance. Although the data were generally consistent with the input values in most cases, there were some obvious exceptions such as barcode 1 (under-represented) and barcodes 15, 20 (overrepresented). It would be great if the authors could comment on these and provide a guideline for choosing the appropriate barcode lines when implementing this TaG-EM method.

      See the response to point 9 from Reviewer 2. Although there seem to be some systematic differences in barcode amplification, the coefficient of variation was relatively consistent across all of the barcode combinations and relative input levels that we examined. Our recommendation (described in the text) is to average across 3-4 independent barcodes (which yielded a R2 values of >0.99 with expected abundance in the structured pooled tests).  

      (2) In Supplemental Figure 6, the authors showed GFP antibody staining data with 20 different TaG-EM barcode lines. The variability in GFP antibody staining results among these different TaG-EM barcode lines concerns the use of these TaG-EM barcode lines for sequencing followed by FACS sorting of native GFP. I expected the native GFP expression would be weaker and much more variable than the GFP antibody staining results shown in Supplemental Figure 6. If this is the case, variation of tissue-specific expression of TaG-EM barcode lines will likely be a confounding factor.

      Aside from barcode 8, which had a mutation in the GFP coding sequence, we did not see significant variability in expression levels either in the wing disc. Subtle differences seen in this figure most likely result from differences in larval staging. Similar consistent native (unstained) GFP expression of the TaG-EM constructs was seen in crosses with Mhc-Gal4 (described above). 

      (3) As the authors mentioned in the manuscript, multiple barcodes for one experimental condition would be a better experimental design. Could the authors suggest a recommended number of barcodes for each experiential condition? 3? 4? Or more? 

      See response to Reviewer #3, point number 1 above.

      (3b) Also, it would be great if the authors could provide a short discussion on the cost of such TaG-EM method. For example, for the phototaxis assay, if it is much more expensive to perform TaG-EM as compared to manually scoring the preference index by videotaping, what would be the practical considerations or benefits of doing TaG-EM over manual scoring?

      While this will vary depending on the assay and the scale at which one is conducting experiments, we have added an analysis of labor savings for the larval gut motility assay (Supplemental Figure 8). We have also added the following text to the Discussion describing some of the trade-offs to consider in assessing the potential benefit of incorporating TaG-EM into behavioral measurements:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8).”

      Recommendations for the authors:  

      While recognising the potential of the TaG-EM methodology, we had a few major concerns that the authors might want to consider addressing:

      As stated above, we are grateful to the reviewers and editor for their thoughtful comments. We have addressed many of the points below in our responses above, so we will briefly respond to these points and where relevant direct the reader to comments above.

      (1) We were concerned about the efficacy of TaG-EM in assessing more complex behaviours than oviposition and phototaxis. We note that Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to back-cross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 and Reviewer #2, item 10, above.

      (2) We were unable to assess the drop-out rates of the TaG-EM barcode from the sequencing. The barcode detection rate is low (Fig S9 and Fig 5F, J and N). This would be a considerable drawback (relating to both experimental design and cost), if a large proportion of the cells could not be assigned an identity.

      See comments above addressing this point.

      (3) The effectiveness of TaG-EM scRNA-seq on the larvae gut is not very effective - the cells are not well annotated, the barcodes seem not to have labelled expected cell types (ECs and EEs), and there is no validation of the Gal4 drivers in vivo.

      See previous comments. We have addressed specific comments above on data processing and annotation, included a visualization of the overall effectiveness of labeling, added a protocol and data on enriched TaG-EM barcode libraries, and have added detailed characterization of the Gal4 drivers in the larval gut (Figure 6, Supplemental Figures 17-21).

      (4) A formal assessment of the cost-effectiveness would be an important consideration in broad uptake of the methodology.

      While this is difficult to do in a comprehensive manner given the breadth of potential applications, we have included estimates of labor savings for one of the behavioral assays that we tested (Supplemental Figure 8). We have also included a discussion of some of the factors that would make TaG-EM useful or cost-effective to apply for behavioral assays (see response to Reviewer #3, comment 3b, above). We have also added the following text to the discussion to address the cost considerations in applying TaG-EM for scRNA-Seq:

      “For single cell RNA-Seq experiments, the cost savings of multiplexing is roughly the cost of a run divided by the number of independent lines multiplexed, plus labor savings by also being able to multiplex upstream flow cytometry, minus loss of unbarcoded cells. Our experiments indicated that for the specific drivers we tested TaG-EM barcodes are detected in around one quarter of the cells if relying on endogenous expression in the gene expression library, though this fraction was higher (~37%) if sequencing an enriched TaG-EM barcode library in parallel (Figure 6, Supplemental Figures 18-19).”

      (5) Similarly, a formal assessment of the effect of the insertion on the variability in GFP expression and the behaviour needs to be documented.

      See responses to Reviewer #1, Reviewer #2, item 9, and Reviewer #3, item 2 above.

      Reviewer #1 (Recommendations For The Authors):

      (in no particular order of importance)

      • L84-85: the authors should either expand, or remove this statement. Indeed, lack of replicates is only true if one ignores that each cell in an atlas is indeed a replicate. Therefore, depending on the approach or question, this statement is inaccurate.

      This sentence was meant to refer to experiments where different experimental conditions are being compared and not to more descriptive studies such as cell atlases. We have revised this sentence to clarify.

      “Outside of descriptive studies, these costs are also a barrier to including replicates to assess biological variability; consequently, a lack of biological replicates derived from independent samples is a common shortcoming of single-cell sequencing experiments.”

      • L103-104: this sentence is unclear.

      We have revised this sentence as follows:

      “Genetically barcoded fly lines can also be used to enable highly multiplexed behavioral assays which can be read out using high throughput sequencing.”

      • In Fig S1 it is unclear why there are more than 20 different sequences in panel B where the text and panel A only mention the generation of 20 distinct constructs. This should be better explained.

      The following text was added to the Figure legend to explain this discrepancy:

      “Because the TaG-EM barcode constructs were injected as a pool of 29 purified plasmids, some of the transgenic lines had inserts of the same construct. In total 20 unique lines were recovered from this round of injection.”

      • It would be interesting to compare the efficiency of TaG-EM driven doublet removal (Fig 5A) with standard doublet-removing software (e.g., DoubletFinder, McGinnis et al., 2019).

      We have done this comparison, which is now shown in Supplemental Figure 15.

      • I would encourage the authors to check whether barcode representation in Fig S13  can be correlated to average library size, as one would expect libraries with shorter reads to be more likely to include the 14-bp barcode and therefore more accurately recapitulate TaG-EM barcode expression.

      These are not independent sequencing libraries, but rather data from barcodes that were multiplexed in a single flow sort, 10x droplet capture, and sequencing library. Thus, there must be some other variable that explains the differential recovery of these barcodes.

      • Fig 4A should appear earlier in the paper.

      We have moved Figure 4A from the previous manuscript (a schematic showing the detailed design of the TaG-EM construct) to Figure 1A in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) There is a typo for Fig S13 figure legends: BC1, BC1, BC3... should be BC1, BC2, BC3.

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      Comments to authors:

      (1) It would be great if the authors could provide an additional explanation on how these 29 barcode sequences were determined.

      Response: This information is in the Methods section. For the original cloned plasmids:

      “Expected construct size was verified by diagnostic digest with _Eco_RI and _Apa_LI. DNA concentration was determined using a Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific) and the randomer barcode for each of the constructs was determined by Sanger sequencing using the following primers:

      SV40_post_R: GCCAGATCGATCCAGACATGA

      SV40_5F: CTCCCCCTGAACCTGAAACA”

      For transgenic flies, after DNA extraction and PCR enrichment (details also in the Methods section):

      “The barcode sequence for each of the independent transgenic lines was determined by Sanger sequencing using the SV40_5F and SV40_PostR primers.”

      (2) Why did the authors choose myr-GFP as the backbone instead of nls-GFP if the downstream application is to perform sequencing?

      We initially chose myr::GFP as we planned to conduct single cell and not single nucleus sequencing and myr::GFP has the advantage of labeling cell membranes which could facilitate the characterization or confirmation of cell type-specific expression, particularly in the nervous system. However, we have considered making a version of the TaG-EM construct with a nuclear targeted GFP (thereby enabling “NucEM”). In the Discussion, we mention this possibility as well as the possibility of using a second nuclear-GFP construct in conjunction with TaG-EM lines is nuclear enrichment is desired:

      “In addition, while the original TaG-EM lines were made using a membrane-localized myr::GFP construct, variants that express GFP in other cell compartments such as the cytoplasm or nucleus could be constructed to enable increased expression levels or purification of nuclei. Nuclear labeling could also be achieved by co-expressing a nuclear GFP construct with existing TaG-EM lines in analogy to the use of hexameric GFP described above.”

      Minor comments:

      (1) Line 193, Supplemental Figure 4 should be Supplemental Figure 5

      Fixed.

      (2) Scale bars should be added in Figure 4, Supplemental Figures 6, 7, and 8A.

      We have added scale bars to these figures and also included scale bars in additional Supplemental Figures detailing characterization of the gut driver lines.

      (3) Were Figure 4C and Supplemental Figure 7 data stained with a GFP antibody?

      No, this is endogenous GFP signal. This is now noted in the Figure legends.

      (4) Line 220, specify the three barcode lines (lines #7, 8, 9) in the text. 

      Added this information.

      Same for Lines 251-254. Line 258, which 8 barcode Gal4 line combinations?

      (5) Line 994, typo: (BC1, BC1, BC3, and BC7)-> (BC1, BC2, BC3, and BC7)

      Fixed.

      (6) Figure 5 F, J and N, add EC-Gal4, EB-Gal4, and EE-Gal4 above each panel to improve readability.

      We have added labels of the cell type being targeted (leftmost panels), the barcode, and the marker gene name to Figure 6 C-N.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation.

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have updated the text to clarify this point.

      Strengths:

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans.

      Weaknesses:

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect.

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We have revised the text to highlight what this study and others demonstrate about SMA-9’s role in body size. Simply stated, SMA-9 is needed together with SMA-3 to promote the expression of genes involved in one-carbon metabolism, collagens, and chaperones, all of which are required for body size. SMA-3 has additional, SMA-9-independent transcriptional targets, including chaperones and ER secretion factors, that also contribute to body size. Finally, SMA-9 regulates additional targets independent of SMA-3 that likely have a minimal role in body size. We have adjusted Figure 5 with new graphs of the original data to make these points more clear.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down.

      We acknowledge that we have not demonstrated a physical interaction between SMA-3 and SMA-9 through a co-immunoprecipitation, and we have indicated in the text that a formal biochemical demonstration would be required to make this point. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. As the reviewer has indicated, a physical interaction between Smads and Schnurris has been amply demonstrated in other systems. A limitation in these previous studies is that only a small number of target genes were analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale. Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We have revised the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9.

      We appreciate this suggestion and have clarified in the text how SMA-9 contributes to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion.

      We thank the reviewer for this suggestion. We have added more context to the Discussion.

      Reviewer #2 (Public Review):

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers.

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data.

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses?

      We agree that these are intriguing questions, and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect.

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which.

      We thank the reviewer for this suggestion. col-94 and col-153 were identified as direct targets of both SMA-3 and SMA-9. We noted this in the Discussion.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated.

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.).

      We have added more information to the Discussion.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 - Figure 3: The authors might want to think about condensing this into two figures.

      To avoid confusion with the different workflows, we prefer to keep these as three separate figures.

      Figure 1a-b: Measurement unit missing on X.

      We added the unit “bps” to these graphs.

      Line 244-246: The authors should stress in the Results that they analyzed publicly available ChIP-Seq data, which was not generated by them, - not just by providing a reference to Kudron et al., 2018. As far as I understood, ChIP was performed with an anti-GFP antibody. Please mention this, and specify the information about the vendor and the catalog number in the Methods.

      We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have clarified these issues in the text.  We have also added information regarding the anti-GFP antibody to the Methods.

      Line 267-270: The authors should either provide experimental evidence that SMA-3 and SMA-9 form complexes or write something like "significant overlap between SMA-3 and SMA-9 peaks may indicate complex formation between these two transcription factors as shown in Drosophila" - but in the absence of proof, this must be a point for the Discussion, not for the Results. Moreover, similar behavior of fat-6 (overlapping ChIP peaks) and nhr-114 (non-overlapping ChIP peaks) in SMA-3 and SMA-9 mutants may be interpreted as a circumstantial argument against SMA-3/SMA-9 complex formation (see Lines 342-348). Importantly, since ChIP-Seq data are available for a wide array of C. elegans TFs, it would be very useful to have an estimate of whether SMA-3/SMA-9 peak overlap is significantly higher than the peak overlap between SMA-3 and several other TFs expressed at the same L2 stage.

      We have clarified our goals regarding SMA-3 and SMA-9 interactions and softened our conclusions by indicating in the text that a formal biochemical demonstration would be required to demonstrate a physical interaction. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. We have added an analysis of HOT sites to address overlap of binding with other transcription factors. We disagree with the interpretation that transcription factors with non-overlapping sites cannot act together to regulate gene expression; however, nhr-114 also has an overlapping SMA-3 and SMA-9 site, so this point becomes less relevant. We have clarified the categorization of nhr-114 in the text.

      Lines 272-292: The authors do not comment on the seemingly quite small overlap between the RNA-Seq and the ChIP-Seq dataset, but I think they should. They have 3205 SMA-3 ChIP peaks and 1867 SMA-3 DEGs, but the amount of directly regulated targets is 367. It is important that the authors provide information on the number of genes to which their peaks have been assigned. Clearly, this will not be one gene per peak, but if it were, this would mean that just 11.5% of bound targets are really affected by the binding. The same number would be 4.7% for the SMA-9 peaks.

      We have added a discussion of the discrepancy between binding sites and DEGs. The high number of additional sites classified as non-functional could represent the detection of weak affinity targets that do not have an actual biological purpose. Alternatively, these sites could have an additional role in DBL-1 signaling besides transcriptional regulation of nearby genes, or they could be regulating the expression of target genes at a far enough distance to not be detected by our BETA analysis as per the constraints chosen for the analysis. The difference between total binding sites and those associated with changes in gene expression underscores the importance of combining RNA-seq with ChIP-seq to identify the most biologically relevant targets. And as the reviewer indicated, more than one gene can be assigned to a single neighboring peak.

      Lines 294-323: I feel like there is a terminology problem, which makes reading very difficult. The authors use "direct targets" as bound genes with significant expression change, but then run into a problem when the gene is bound by SMA-9 and SMA-3, but significant expression change is only associated with one of the two factors. I am not sure this is consistent with the idea of the SMA3/SMA9 complex. Also, different modalities of the SMA3 and SMA9 effect in 15 cases can be explained by co-factors. Reading would be also simplified if the order of the panels in Figure 3 were different. Currently, the authors start their explanation by referring to the shared SMA-3/SMA-9 targets (Figures 3c-d), and only later come to Figure 3b. In general, the authors should start with a clear explanation of what is on the figure (currently starting on Line 313), otherwise, it is unclear why, if the authors only discuss common targets, it is not just 114+15=129 targets, but more.

      We have re-ordered the columns in Figure 3 to match the order discussed in the text. We also incorporated more precise language about regulation by SMA-3 and/or SMA-9 in the text.

      Lines 325-355: The chapter has a rather unfortunate name "Mechanisms of integration of SMA-3 and SMA-9 function", although the authors do not provide any mechanism. Using 3 target genes, they show that if the regulatory modality of SMA-3 and SMA-9 is the same (2 examples), there is no difference in the expression of the targets, but if the modalities are opposing (1 example), SMA-9 repressive action is epistatic to the SMA-3 activating action. Can this be generalized? The authors should test all their 15 targets with opposite regulations. Moreover, it seems obvious to ask whether the intermediate phenotype of the double-mutants can be attributed to the action of these 15 genes activated by SMA-3 and repressed by SMA-9. I would suggest testing this by RNAi. I would also suggest renaming the chapter to something better reflecting its content.

      We have removed the word “mechanism” from the title of this section. We also performed additional RT-PCR experiments on another 5 targets with opposing directions of regulation. The results from these genes are consistent with the result from C54E4.5, demonstrating that the epistasis of sma-9 is generalizable.

      Figure 4b: Why was a two-way ANOVA performed here? With the small number of measurements, I would consider using a non-parametric test.

      These data are parametric and the distribution of the data is normal, so we chose to use a parametric test (ANOVA).

      Lines 354-355. The authors offer two suggestions for the mechanism of the epistatic action of SMA-9 on SMA-3 in the case of C54E4.5, but this is something for the Discussion. If they want to keep it in the Results they should address this experimentally by performing SMA-3 ChIP-seq in the SMA-9 mutants and SMA-9 ChIP-Seq in the SMA-3 mutants.

      We moved these models to the discussion as suggested.

      Lines 365-367: "We expect that clusters of genes involved in fatty acid metabolism and innate immunity mediate the physiological functions of BMP signaling in fat storage and pathogen resistance, respectively." - This is pretty confusing since the Authors claim in the previous sentence that regulation of immunity by SMA-9 is TGF-beta independent.

      Co-regulation of immunity by BMP signaling and SMA-9 is already known. The novel insight is that SMA-9 may have an additional independent role in immunity. We have clarified the language to address this confusion.

      Lines 377, and 380: Please explain in non-C. elegans-specific terminology, what rrf-3 and LON-2 are (e.g. write "glypican LON-2" instead of just "LON-2") and add relevant references.

      We added information on the proteins encoded by these genes.

      Lines 382-384: I am not sure what the Authors mean here by "more limiting".

      We substituted the phrase “might have a more prominent requirement in mediating the exaggerated growth defect of a lon-2 mutant”.

      Lines 388-392: I found this very confusing. What were these 36 genes? Were these direct targets of SMA-3, SMA-9, or both? Top 36 targets? 36 targets for which mutants are available?

      The new Figure 5 clarifies whether target genes are SMA-3-exclusive, SMA-9-exclusive, or co-regulated. The text was also updated for clarity.

      Line 397: This is the first time the authors mention dpy-11 but they do not say what it is until later, and they do not say whether it is a target of SMA3/SMA9. Checking Figure 3, I found that it is among the 238 genes bound by both but upregulated only by SMA3. The authors need to explicitly state this - from this point on, they have a section for which SMA-9 appears to be irrelevant.

      We added the molecular function of dpy-11 at its first mention. Furthermore, we included the hypothesis that SMA-3 may regulate collagen secretion independently of SMA-9. Our subsequent results with sma-9 mutants disprove this hypothesis.

      Line 402: Is ROL-6 a SMA-3/SMA-9 target or just a marker gene?

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      Line 421: I am not sure what "more skeletonized" means.

      Replaced with “thinner and skeletonized”

      Figure 2b and 2d legends: "Non-target genes nevertheless showing differential expression are indicated with green squares." (l. 581-582 and again l. 588-589) I think should be "Non-direct target genes...".

      Changed to “non-direct target genes”

      Figure 7 legend: Please indicate the scale bar size in the legend.

      Indicated the scale bar size in the legend.

      Figure 7: The ER marker is referred to as "ssGFP::KDEL" (in the image and Line 700), however in the text it is called "KDEL::oxGFP" (Line 419). Please use consistent naming.

      We fixed the inconsistent naming.

      All the experiment suggestions made are optional and can, in principle, be ignored if the authors tone down their claims (for example, the SMA-3/SMA-9 complex formation).

      Reviewer #2 (Recommendations For The Authors):

      (1) As a control: Have the authors found the known regulated genes among the differentially regulated ones?

      Previously known target genes such as fat-6 and zip-10 were identified here. We have added this information in the text.

      (2) How many repetitions were performed in Figure 4b? I am wondering as the deviation for C54E4.5 is quite large and that makes me worry that the significant differences stated are not robust.

      There were two biologically independent collections from which three cDNA syntheses were analyzed using two technical replicates per point.

      (3) Lines 333-336: Can you really make this claim that the antagonistic effects seen in the regulation of body size can be correlated with some targets being regulated in the opposite direction? I would assume that the situation is far more complex as SMADs also regulate other processes.

      We agree with the reviewer that multiple models could explain this antagonism, and we have added distinct alternatives in the text.

      (4) Lines 367-369: Add the respective reference please.

      We have added the relevant references.

    1. Author response:

      We are both honored and humbled by the high praise our work received from all three reviewers. Below, we address the common comments made by the reviewers:

      (1) Value and Impact of the Resource: We are grateful for the recognition of our dataset as a valuable and high-quality resource. Our primary goal was to generate a comprehensive dataset on protein abundance and phosphorylation dynamics during Xenopus oocyte maturation. We are pleased that this work has been seen as a solid foundation for future studies in Xenopus research and beyond, with broader implications for oocyte and cell cycle biology.

      (2) Focus on Functional Validation and Contextualization with Prior Studies: The manuscript was submitted as a Tools and Resources article, a format that emphasizes the creation and presentation of datasets, tools, and methodological advances to facilitate future discoveries. In alignment with this format, we ensured that the information is accessible and deployable for the broader scientific community. While we did not include functional validation of specific pathways, the dataset provides a robust framework for generating numerous testable hypotheses. We plan to pursue some of these follow-up studies in our labs and encourage the community to explore these further.

      (3) Contextualization with Prior Studies: We appreciate the recognition of our efforts to integrate our findings with the existing body of literature. In conclusion, we would like to thank the reviewers for their evaluation and thoughtful suggestions. We look forward to seeing how this dataset contributes to future discoveries in the field.

    1. Author response:

      eLife Assessment:

      In this important study, the authors combine innovative experimental approaches, including direct compressibility measurements and traction force analyses, with theoretical modeling to propose that wild-type cells exert compressive forces on softer HRasV12-transformed cells, influencing competition outcomes. The data generally provide solid evidence that transformed epithelial cells exhibit higher compressibility than wild-type cells, a property linked to their compaction during mechanical cell competition. However, the study would benefit from further characterization of how compression affects the behavior of HRasV12 cells and clearer causal links between compressibility and competition outcomes.

      We thank the reviewers and the editor for their thoughtful and encouraging feedback on our study and for appreciating the innovation in our experimental and theoretical approaches. We acknowledge the importance of further clarifying the mechanistic links between the compressibility of HRas<sup>V12</sup>-transformed cells, their compaction, and the outcomes of mechanical cell competition. In the revised manuscript, we will include additional experiments and analyses to assess how compression influences the cellular behavior and fate of HRas<sup>V12</sup>-transformed cells during competition. In addition, to strengthen the connection between collective compressibility and competition outcomes, we will integrate quantitative analyses of cell dynamics and additional modeling to explicitly correlate the mechanical properties with the spatial and temporal aspects of cell elimination. These additions will address the reviewer’s concerns comprehensively, further enriching the mechanistic understanding presented in the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article, Gupta and colleagues explore the parameters that could promote the elimination of active Ras cells when surrounded by WT cells. The elimination of active Ras cells by surrounding WT cells was previously described extensively and associated with a process named cell competition, a context dependant elimination of cells. Several mechanisms have been associated with competition, including more recently elimination processes based on mechanical stress. This was explored theoretically and experimentally and was either associated with differential growth and sensitivity to pressure and/or differences in homeostatic density/pressure. This was extensively validated for the case of Scribble mutant cells which are eliminated by WT MDCK cells due to their higher homeostatic density. However, there has been so far very little systematic characterisation of the mechanical parameters and properties of these different cell types and how this could contribute to mechanical competition.

      Here, the authors used the context of active Ras cells in MDCK cells (with some observations in vivo in mice gut which are a bit more anecdotal) to explore the parameters causal to Ras cell elimination. Using for the first time traction force microscopy, stress microscopy combined with Bayesian inference, they first show that clusters of active Ras cells experience higher pressure compared to WT. Interestingly, this occurs in absence of differences in growth rate, and while Ras cells seems to have lower homeostatic density, in contractions with the previous models associated with mechanical cell competition. Using a self-propelled Voronoi model, they explored more systematically the conditions that will promote the compression of transformed cells, showing globally that higher Area compressibility and/or lower junctional tension are associated with higher compressibility. Using then an original and novel experimental method to measure bulk compressibility of cell populations, they confirmed that active Ras cells are globally twice more compressible than WT cells. This compressibility correlates with a disruption of adherens junctions. Accordingly, the higher pressure near transformed Ras cells can be completely rescued by increasing cell-cell adhesion through E-cad overexpression, which also reduces the compressibility of the transformed cells. Altogether, these results go along the lines of a previous theoretical work (Gradeci et al. eLife 2021) which was suggesting that reduced stiffness/higher compressibility was essential to promote loser cell elimination. Here, the authors provide for the first time a very convincing experimental measurement and validation of this prediction. Moreover, their modelling approach goes far beyond what was performed before in terms of exploration of conditions promoting compressibility, and their experimental data point at alternative mechanisms that may contribute to mechanical competition.

      Strengths:

      - Original methodologies to perform systematic characterisation of mechanical properties of Ras cells during cell competition, which include a novel method to measure bulk compressibility.<br /> - A very extensive theoretical exploration of the parameters promoting cell compaction in the context of competition.

      We thank the reviewer for their detailed and thoughtful assessment of our study and for recognizing the originality of our methodologies, including the novel bulk compressibility measurement technique and the extensive theoretical exploration of parameters influencing mechanical competition. We are pleased that the reviewer finds our experimental validation and modeling approach convincing and acknowledges the relevance of our findings in advancing the understanding of mechanical cell competition. We will carefully address all the points raised to further clarify and strengthen the manuscript.

      Weaknesses:

      - Most of the theoretical focus is centred on the bulk compressibility, but so far does not really explain the final fate of the transformed cells. Classic cell competition scenario (including the one involving active Ras cells) lead to the elimination of one cell population either by cell extrusion/cell death or global delamination. This aspect is absolutely not explored in this article, experimentally or theoretically, and as such it is difficult to connect all the observables with the final outcome of cell competition. For instance, higher compressibility may not lead to loser status if the cells can withstand high density without extruding compared to the WT cells (and could even completely invert the final outcome of the competition). Down the line, and as suggested in most of the previous models/experiments, the relationship between pressure/density and extrusion/death will be the key factor that determine the final outcome of competition. However, there is absolutely no characterisation of cell death/cell extrusion in the article so far.

      We thank the reviewer for highlighting this important point. We agree that understanding the relationship between pressure, density, and the final outcomes of cell competition, such as extrusion and cell death, is crucial to connecting the mechanical properties to competition outcomes. While extrusion and cell death have been extensively characterized in previous works (e.g., https://www.nature.com/articles/s41467-021-27896-z; https://www.nature.com/articles/ncb1853), we nevertheless recognize the need to address this aspect more explicitly in our study. To this end, we have indeed performed experiments to characterize cell extrusion and cell death under varying conditions of pressure and density. We will incorporate these data into the revised manuscript. These additions will provide a more comprehensive understanding of how mechanical imbalance drives cell competition and determine the final fate of transformed cells.

      - While the compressibility measurement are very original and interesting, this bulk measurement could be explained by very different cellular processes, from modulation of cell shape, to cell extrusion and tissue multilayering (which by the way was already observed for active Ras cells, see for instance https://pubmed.ncbi.nlm.nih.gov/34644109/). This could change a lot the interpretation of this measurement and to which extend it can explain the compression observed in mixed culture. This compressibility measurement could be much more informative if coupled with an estimation of the change of cell aspect ratio and the rough evaluation of the contribution of cell shape changes versus alternative mechanisms.

      We thank the reviewer for raising this important concern. In our model system and within the experimental timescale of our studies involving gel compression microscopy (GCM) experiments, we do not observe tissue multilayering and cell extrusion, as these measurements are performed on homogeneous populations (pure wild-type or pure transformed cell monolayer). However, to address the reviewer’s suggestion, we will include measurements of cell aspect ratio as well as images eliminating the possibility of multilayering/extrusion in the revised manuscript. These results will provide additional insights into the plausible contributions of cell shape changes. Furthermore, our newer results indicate that the compressibility differences arise from variations in the intracellular organization (changed in nuclear and cytoskeletal organization) between wild-type and transformed cells. While a detailed molecular characterization of these underlying mechanisms is beyond the scope of the current manuscript, we acknowledge its importance and plan to explore it in a future study. These revisions will clarify and strengthen the interpretation of our findings.

      - So far, there is no clear explanation of why transformed Ras cells get more compacted in the context of mixed culture compared to pure Ras culture. Previously, the compaction of mutant Scribble cells could be explained by the higher homeostatic density of WT cells which impose their prefered higher density to Scribble mutant (see Wagstaff et al. 2016 or Gradeci et al 2021), however that is not the case of the Ras cells (which have even slightly higher density at confluency). If I understood properly, the Voronoid model assumes some directional movement of WT cell toward transformed which will actively compact the Ras cells through self-propelled forces (see supplementary methods), but this is never clearly discussed/described in the results section, while potentially being one essential ingredient for observing compaction of transformed cells. In fact, this was already described experimentally in the case of Scribble competition and associated with chemoattractant secretion from the mutant cells promoting directed migration of the WT (https://pubmed.ncbi.nlm.nih.gov/33357449/). It would be essential to show what happens in absence of directional propelled movement in the model and validate experimentally whether there is indeed directional movement of the WT toward the transformed cells. Without this, the current data does not really explain the competition process.

      We introduced directional movement of wild-type cells towards neighbouring transformed cells (and a form of active force to be exerted by them), motivated by the tissue compressibility measurements from the Gel Compression Microscopy experiments (Fig. 4E-L). This allowed us to devise an equivalent method of measuring the material response to isotropic compression within the SPV model framework. While the role of directional propelled movement is an area of ongoing investigation and has not been explored extensively within the current study, we emphasize that even without directional propulsion in the model, our results demonstrate compressive stress or elevated pressure, and increased compaction within the transformed population under suitable conditions reported in this work (when k<1), exhibiting a greater tissue-level compressibility in the transformed cells compared to WT cells (Figs. 4C-D), thereby laying the ground for competition. To clarify these concerns, we will provide additional results as well as detailed discussions on the effect of cell movements in compression.

      - Some of the data lack a bit of information on statistic, especially for all the stress microscopy and traction forces where we do no really know how representative at the stress patterns (how many experiment, are they average of several movies ? integrated on which temporal window ?)

      We thank the reviewer for highlighting the need for additional details regarding the statistical representation of our stress microscopy and traction force data. We will address these concerns in the revised manuscript by providing clear descriptions of the number of experiments, the averaging methodology, and the temporal windows used for analysis. Currently, Figs. 2A and 2C represent data from single time points, as the traction and stress landscapes evolve dynamically as transformed cells begin extruding (as shown in Supplementary movie 1). In contrast, Fig. 2H represents data collected from several samples across three independent experiments, all measured at the 3-hour time point following doxycycline induction. This specific time point is critical because it captures the emergence of compressive stresses before extrusion begins, simplifying the analysis and ensuring consistency. We will ensure these details are clearly articulated in the revised text and figure legends.

      Reviewer #2 (Public review):

      The work by Gupta et al. addresses the role of tissue compressibility as a driver of cell competition. The authors use a planar epithelial monolayer system to study cell competition between wild type and transformed epithelial cells expressing HRasV12. They combine imaging and traction force measurements from which the authors propose that wild type cells generate compressive forces on transformed epithelial cells. The authors further present a novel setup to directly measure the compressibility of adherent epithelial tissues. These measurements suggest a higher compressibility of transformed epithelial cells, which is causally linked to a reduction in cell-cell adhesion in transformed cells. The authors support their conclusions by theoretical modelling using a self-Propelled Voronoi model that supports differences in tissue compressibility can lead to compression of the softer tissue type.

      The experimental framework to measure tissue compressibility of adherent epithelial monolayers establishes a novel tool, however additional controls of this measurement appear required. Moreover, the experimental support of this study is mostly based on single representative images and would greatly benefit from additional data and their quantitative analysis to support the authors' conclusions. Specific comments are also listed in the following:

      Major points:

      It is not evident in Fig2A that traction forces increase along the interface between wild type and transformed populations and stresses in Fig2C also seem to be similar at the interface and surrounding cell layer. Only representative examples are provided and a quantification of sigma_m needs to be provided.

      In Figure 1-3 only panel 2G and 2H provide a quantitative analysis, but it is not clear how many regions of interest and clusters of transform cells were quantified.

      We thank the reviewer for their detailed comments and for highlighting the importance of additional quantitative analyses to support our conclusions. We appreciate their recognition of our novel experimental framework to measure tissue compressibility and the overall approach of our study. Regarding Fig. 2A and Fig. 2C, we acknowledge the need for further clarity. While the traction forces and stress patterns may not appear uniformly distinct at the interface in the representative images, these differences are more evident at specific time points before extrusion begins. Please note that the traction and stress landscapes evolve dynamically as transformed cells begin extruding (as shown in Supplementary movie 1). We will include a quantification of σ<sub>m</sub>​ and additional data from multiple experiments to substantiate the observations and address this concern in the revised manuscript. Currently, the data in Fig. 2G and Fig. 2H represent several regions of interest and transformed cell clusters collected from three independent experiments, all analyzed at the 3-hour time point after doxycycline induction. This time point was chosen because it captures the compressive stress emergence without interference from extrusion processes, simplifying the analysis. We will expand these sections with detailed descriptions of the sample sizes and statistical analyses to ensure greater transparency and reproducibility. These revisions will provide a stronger quantitative foundation for our findings and address the reviewer's concerns.

      Several statements appear to be not sufficiently justified and supported by data.<br /> For example the statement on pg 3. line 38 seems to lack supportive data 'This comparison revealed that the thickness of HRasV12-expressing cells was reduced by more than 1.7-fold when they were surrounded by wild type cells. These observations pointed towards a selective, competition-dependent compaction of HRasV12-expressing transformed cells but not control cells, in the intestinal villi of mice.'  Similarly, the statement about a cell area change of 2.7 fold (pg 3 line 47) lacks support by measurements.

      We thank the reviewer for pointing out the need for more supportive data to justify several statements in the manuscript. Specifically, the observation regarding the reduction in the thickness of HRas<sup>V12</sup>-expressing cells by more than 1.7-fold when surrounded by wild-type cells, and the statement about a 2.7-fold change in cell area, will be supported by detailed measurements. In the revised manuscript, we will include quantitative analyses with additional figures that clearly document these changes. These figures will provide representative images, statistical summaries, and detailed descriptions of the measurements to substantiate these claims. We appreciate the reviewer highlighting these areas and will ensure that all statements are robustly backed by data.

      What is the rationale for setting 𝐾p = 1 in the model assumptions if clear differences in junctional membranes of transformed versus wild type cells occur, including dynamic ruffling? This assumption does not seem to be in line with biological observations.

      While the specific role of K<sub>p</sub> in the differences observed in the junctional membranes of transformed versus WT cells, including dynamical ruffling, is not directly studied in this work, our findings indicate that the lower junctional tension (weaker and less stable cellular junctions) in mutant cells is influenced primarily by competition in the dimensionless cell shape index within the model. This also suggests a larger preferred cell perimeter (P<sub>0</sub>) for mutant cells, corresponding to their softer, unjammed state. Huang et al. (https://doi.org/10.1039/d3sm00327b) have previously argued that a high P<sub>0</sub> may, in some cases, result from elevated cortical tension along cell edges, or reflect weak membrane elasticity, implying a smaller K<sub>p</sub>. While this connection could be an intriguing avenue for future exploration, we emphasize that K<sub>p</sub> is not expected to alter any of the key findings or conclusions reported in this work. We will include any required analysis and corresponding discussions in the revised manuscript.

      The novel approach to measure tissue compressibility is based on pH dependent hydrogels. As the pH responsive hydrogel pillar is placed into a culture medium with different conditions, an important control would be if the insertion of this hydrogel itself would change the pH or conditions of the culture assays and whether this alters tissue compressibility or cell adhesion. The authors could for example insert a hydrogel pillar of a smaller diameter that would not lead to compression or culture cells in a larger ring to assess the influence of the pillar itself.

      We appreciate the reviewer’s insightful comment regarding the potential effects of the pH-responsive hydrogel pillar on the culture conditions and tissue compressibility. In our experiments, the expandable hydrogels are kept separate from the cells until the pH of the hydrogel is elevated to 7.4, ensuring that the hydrogel does not impact the culture environment. However, we acknowledge the concern and will include additional controls in the revised manuscript. Specifically, we will insert a hydrogel pillar with a smaller diameter that would not induce compression on culture cells in a larger ring to assess any potential influence of the hydrogel pillar itself. This will help to further validate our experimental setup.

      The authors focus on the study of cell compaction of the transformed cells, but how does this ultimately lead to a competitive benefit of wild type cells? Is a higher rate of extrusion observed and associated with the compaction of transformed cells or is their cell death rate increased? While transformed cells seem to maintain a proliferative advantage it is not clear which consequences of tissue compression ultimately drive cell competition between wild type and transformed cells.

      We thank the reviewer for highlighting this important point. We agree that understanding how tissue compression leads to a competitive advantage for wild type cells is crucial. While our current study focuses on the mechanical properties of transformed cells leading to the compaction and subsequent extrusion of the transformed cells, we recognize the need to explicitly connect these properties to the final outcomes of cell competition, such as extrusion or cell death. Although extrusion and cell death have been extensively characterized in previous studies (e.g., https://www.nature.com/articles/s41467-021-27896-z; https://www.nature.com/articles/ncb1853), we have indeed performed additional experiments to investigate the relationship between pressure, density, and these processes in our system. In the revised manuscript, we will include these new data, which will help to clarify how mechanical stress, driven by tissue compression, contributes to the competition between wild type and transformed cells and influences their eventual fate.

      The argumentation that softer tissues would be more easily compressed is plausible. However, which mechanism do the authors suggest is generating the actual compressive stress to drive the compaction of transformed cells? They exclude a proliferative advantage of wild type cells, which other mechanisms will generate the compressive forces by wild type cells?

      We thank the reviewer for raising this important question. As rightly pointed out by the reviewer indeed in our model system, we do not observe a proliferative advantage for the wild-type cells, and the compressive forces exerted by the wild-type cells are due to their intrinsic mechanical properties, such as lesser compressibility compared to the transformed cells. This difference in compressibility results in wild-type cells generating compressive stress at the interface with the transformed cells. Regarding the mechanism underlying the increased compressibility of the transformed cells, our newer findings indicate that the differences in compressibility arise from variations in the intracellular organization, specifically changes in nuclear and cytoskeletal organization between wild-type and transformed cells. While a detailed molecular characterization of these mechanisms is beyond the scope of the current manuscript, we acknowledge its significance and plan to investigate it in future work. We will, nevertheless, include a detailed discussion on the mechanism underlying the differential compressibility of wild-type and transformed cells in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The revised manuscript contains new results and additional text. Major revisions:

      (1) Additional simulations and analyses of networks with different biophysical parameters and with identical time constants for E and I neurons (Methods, Supplementary Fig. 5).

      (2) Additional simulations and analyses of networks with modifications of connectivity parameters to further analyze effects of E/I assemblies on manifold geometry (Supplementary Fig. 6).

      (3) Analysis of synaptic current components (Figure 3 D-F; to analyze mechanism of modest amplification in Tuned networks). 

      (4) More detailed explanation of pattern completion analysis (Results).

      (5) Analysis of classification performance of Scaled networks (Supplementary Fig.8).

      (6) Additional analysis (Figure 5D-F) and discussion (particularly section “Computational functions of networks with E/I assemblies”) of functional benefits of continuous representations in networks with E-I assemblies. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      Strengths: 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.  (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Weaknesses: 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      We agree that further mechanistic insights would be of interest and addressed this issue at different levels:

      (1) Biophysical parameters: to determine whether network behavior depends on specific choices of biophysical parameters in E and I neurons we equalized biophysical parameters across neuron types. The main observations are unchanged, suggesting that the observed effects depend primarily on network connectivity (see also response to comment [2]).

      (2) Mechanism of modest amplification in E/I assemblies: analyzing the different components of the synaptic currents demonstrate that the modest amplification of activity in Tuned networks results from an “imperfect” balance of recurrent excitation and inhibition within assemblies (see new Figures 3D-F and text p.7). Hence, E/I co-tuning substantially reduces the net amplification in Tuned networks as compared to Scaled networks, thus preventing discrete attractor dynamics and stabilizing network activity, but a modest amplification still occurs, consistent with biological observations.

      (3) Representational geometry: to obtain insights into the network mechanisms underlying effects of E/I assemblies on the geometry of population activity we tested the hypothesis that geometrical changes depend, at least in part, on the modest amplification of activity within E/I assemblies (see Supplementary Figure 6). We changed model parameters to either prevent the modest amplification in Tuned networks (increasing I-to-E connectivity within assemblies) or introduce a modest amplification in subsets of neurons by other mechanisms (concentration-dependent increase in the excitability of pseudo-assembly neurons; Scaled I networks with reduced connectivity within assemblies). Manipulations that introduced a modest, input-dependent amplification in neuronal subsets had geometrical effects similar to those observed in Tuned networks, whereas manipulations that prevented a modest amplification abolished these effects (Supplementary Figure 6). Note however that these manipulations generated different firing rate distributions. These results provide a starting point for more detailed analyses of the relationship between network connectivity and representational geometry (see p.12).

      In summary, our additional analyses indicate that effects of E/I assemblies on representational geometry depend primarily on network connectivity, rather than specific biophysical parameters, and that the resulting modest amplification of activity within assemblies makes an important contribution. Further analyses may reveal more specific relationships between E/I assemblies and representational geometry, but such analyses are beyond the scope of this study.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for raising this point. We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. Nevertheless, to assess whether network behavior depends on specific choices of biophysical parameters in E and I neurons, we have performed additional simulations with equal synaptic time constants and equal biophysical parameters for all neurons. Each neuron also received the same number of inputs from each population (see revised Methods). Results were similar to those observed previously (Supplementary Fig.5 and p.9 of main text). We therefore conclude that the main effects observed in Tuned networks cannot be explained by differences in biophysical parameters between E and I neurons but is primarily a consequence of network connectivity.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function. 

      In the previous manuscript, the analysis of potential computational benefits other than pattern classification was limited and the discussion of this issue was condensed into a single itemized paragraph to avoid excessive speculation. Although a thorough analysis of potential computational benefits exceeds the scope of a single paper, we agree with the reviewer that this issue is of interest and therefore added additional analyses and discussion.

      In the initial manuscript we analyzed pattern classification primarily to investigate whether Tuned networks can support this function at all, given that they do not exhibit discrete attractor states. We found this to be the case, which we consider a first important result.

      Furthermore, we found that precise balance of E/I assemblies can protect networks against catastrophic firing rate instabilities when assemblies are added sequentially, as in continual learning. Results from these simulations are now described and discussed in more detail (see Results p.11 and Discussion p.13).

      In the revised manuscript, we now also examine additional potential benefits of Tuned networks and discuss them in more detail (see new Figure 5D-F and text p.11). One hypothesis is that continuous representations provide a distance metric between a given input and relevant (learned) stimuli. To address this hypothesis, we (1) performed regression analysis and (2) trained support vector machines (SVMs) to predict the concentration of a given odor in a mixture based on population activity. In both cases, Tuned E+I networks outperformed Scaled and _rand n_etworks in predicting the concentration of learned odors across a wide range mixtures (Figure 5D-F).  E/I assemblies therefore support the quantification of learned odors within mixtures or, more generally, assessments of how strongly a (potentially complex) input is related to relevant odors stored in memory. Such a metric assessment of stimulus quality is not well supported by discrete attractor networks because inputs are mapped onto discrete network states.

      The observation that Tuned networks do not map inputs onto discrete outputs indicates that such networks do not classify inputs as distinct items. Nonetheless, the observed geometrical modifications of continuous representations support the classification of learned inputs or the assessment of metric relationships by hypothetical readout neurons. Geometrical modifications of odor representations may therefore serve as one of multiple steps in multi-layer computations for pattern classification (and/or other computations). In this scenario, the transformation of odor representations in Dp may be seen as related to transformations of representations between different layers in artificial networks, which collectively perform a given task (notwithstanding obvious structural and mechanistic differences between artificial and biological networks). In other words, geometrical transformations of representations in Tuned networks may overrepresent learned (relevant) information at the expense of other information and thereby support further learning processes in other brain areas. An obvious corollary of this scenario is that Dp does not perform odor classification per se based on inputs from the olfactory bulb but reformats representations of odor space based on experience to support computational tasks as part of a larger system. This scenario is now explicitly discussed (p.14).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks. 

      Strengths: 

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper. 

      Weaknesses: 

      that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks. 

      We have now quantified classification by networks with discrete attractor dynamics (Scaled) along with other networks. However, because the neuronal covariance matrix for such networks is low rank and not invertible, pattern classification cannot be analyzed by QDA as in Figure 5B. We therefore classified patterns from the odor subspace by template matching, assigning test patterns to one of the four classes based on correlations (see Supplementary Figure 8). As expected, Scaled networks performed well, but they did not outperform Tuned networks. Moreover, the performance of Scaled networks, but not Tuned networks, depended on the order in which odors were presented to the network. This hysteresis effect is a direct consequence of persistent attractor states and decreased the general classification performance of Scaled networks (see Supplementary Figure 8 for details). These results confirm the prediction that networks with discrete attractor states can efficiently classify inputs, but also reveal disadvantages arising from attractor dynamics. Moreover, the results indicate that the classification performance of Tuned networks is also high under the given task conditions, which simulate a biologically realistic scenario.

      We would also like to emphasize that classification may not be the only task, and perhaps not even a main task, of Dp/piriform cortex or other memory networks with E/I assemblies. Conceivably, other computations could include metric assessments of inputs relative to learned inputs or additional learning-related computations. Please see our response to comment (3) of reviewer 1 for a further discussion of this issue. 

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).  

      We thank the reviewer for this comment. We now explicitly discuss multistability (see p. 12) and refer to additional references in the statements addressing network stability.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B). 

      We thank the reviewer for this suggestion. We have added raster plots of responses to both familiar and novel inputs in the revised manuscript (Figure 2D and Supplementary Figure 4A).

      Reviewer #3 (Public Review): 

      Summary: 

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli. 

      Strengths: 

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds. 

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions. 

      Weaknesses: 

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion. 

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. We added a definition of pattern completion in the text: “…the retrieval of the whole memory from noisy or corrupted versions of the learned input.”. Pattern completion may be assessed using different procedures. In computational studies, it is often analyzed by delivering input to a subset of the assembly neurons which store a given memory (partial activation). Under these conditions, we find recruitment of the entire assembly in all structured networks, as demonstrated in Supplementary Figure 3. However, these conditions are unlikely to occur during odor presentation because the majority of neurons do not receive any input.

      Another more biologically motivated approach to assess pattern completion is to gradually modify a realistic odor input into a learned input, thereby gradually increasing the overlap between the two inputs. This approach had been used previously in experimental studies (references added to the text p.6). In the presence of assemblies, recurrent connectivity is expected to recruit assembly neurons (and thus retrieve the stored pattern) more efficiently as the learned pattern is approached. This should result in a nonlinear increase in the similarity between the evoked and the learned activity pattern. This signature was prominent in Scaled networks but not in Tuned or rand networks. Obviously, the underlying procedure is different from the partial activation of the assembly described above because input patterns target many neurons (including neurons outside assemblies) and exhibit a biologically realistic distribution of activity. However, this approach has also been referred to as “pattern completion” in the neuroscience literature, which may be the source of semantic confusion here. To clarify the difference between these approaches we have now revised the text and explicitly described each procedure in more detail (see p.6). 

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states. 

      We agree that a more rigorous comparison of Tuned and Scaled networks would be of interest. We have added the variance analysis (Fig 4G) and angle distributions (Fig. 4H) for both Tuned I and Scaled networks. However, the Mahalanobis distances and Quadratic Discriminant Analysis cannot be applied to Scaled networks because their neuronal covariance matrix is low rank and not invertible_. To nevertheless compare these networks, we performed template matching by assigning test patterns to one of the four odor classes based on correlations to template patterns (Supplementary Figure 8; see also response to the first comment of reviewer 2). Interestingly, _Scaled networks performed well at classification but did not outperform Tuned networks, and exhibited disadvantages arising from attractor dynamics (Supplementary Figure 8; see also response to the first comment of reviewer 2). Furthermore, in further analyses we found that continuous representational manifolds support metric assessments of inputs relative to learned odors, which cannot be achieved by discrete representations. These results are now shown in Figure 5D-E and discussed explicitly in the text on p.11 (see also response to comment 3 of reviewer 1).

      We preferred not to add a sentence in the Discussion about benefits of networks having both sources of inhibition_,_ as we find this a bit too speculative.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.  

      Thank you for this comment. We have provided additional data figures to support the following statements:

      “d<sub>M</sub> was again increased upon learning, particularly between learned odors and reference classes representing other odors (Supplementary Figure 9)”

      “decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (Supplementary Figure 6 B)”  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      The paper is generally well-written, the figures are informative and of good quality, and multiple approaches and metrics have been used to test and support the main results of the paper. 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.   (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      Precise balancing of excitation and inhibition in subnetworks would lead to the cancellation of specific dynamical modes responsible for the amplification of responses (hence, deviating from the attractor dynamics with an unstable specific mode). What is the key difference in the specific E/I networks here (tuned I or/and tuned E+I) which make them stand between random and attractor networks? Excitatory and inhibitory neurons have different parameters in the model (Table 1). Time constants of inhibitory and excitatory synapses are also different (P. 13). Are these parameters causing networks to be effectively more excitation dominated (hence deviating from a random spectrum which would be expected from a precisely balanced E/I network, with exactly the same parameters of E and I neurons)? It is necessary to analyse the network models, describe the key mechanism for their amplification, and pinpoint the key differences between E and I neurons which are crucial for this. 

      To address these comments we performed additional simulations and analyses at different levels. Please see our reply to comment (1) of the public review (reviewer 1) for a detailed description. We thank the reviewer for these constructive comments.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for this comment. We have now carried out additional simulations with equal time constants for all neurons. Please see our reply to the public review for more details (comment 2 of reviewer 1).

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      Please see our reply to the public review (comment 3 of reviewer 1).

      Specific comments: 

      Abstract: "resulting in continuous representations that reflected both relatedness of inputs and *an individual's experience*" 

      It didn't become apparent from the text or the model where the role of "individual's experience" component (or "internal representations" - in the next line) was introduced or shown (apart from a couple of lines in the Discussion) 

      We consider the scenario that that assemblies are the outcome of an experience-dependent plasticity process. To clarify this, we have now made a small addition to the text: “Biological memory networks are thought to store information by experience-dependent changes in the synaptic connectivity between assemblies of neurons.”.

      P. 2: "The resulting state of "precise" synaptic balance stabilizes firing rates because inhomogeneities or fluctuations in excitation are tracked by correlated inhibition" 

      It is not clear what the "inhomogeneities" specifically refers to - they can be temporal, or they can refer to the quenched noise of connectivity, for instance. Please clarify what you mean. 

      The statement has been modified to be more precise: “…“precise” synaptic balance stabilizes firing rates because inhomogeneities in excitation across the population or temporal variations in excitation are tracked by correlated inhibition…”.

      P. 3 (and Methods): When odour stimulus is simulated in the OB, the activity of a fraction of mitral cells is increased (10% to 15 Hz) - but also a fraction of mitral cells is suppressed (5% to 2 Hz). What is the biological motivation or reference for this? It is not provided. Is it needed for the results? Also, it is not explained how the suppressed 5% are chosen (e.g. randomly, without any relation to the increased cells?). 

      We thank the reviewer for this comment. These changes in activity directly reflect experimental observations. We apologize that we forgot to include the references reporting these observations (Friedrich and Laurent, 2001 and 2004); this is now fixed.

      In our simulation, OB neurons do not interact with each other, and the suppressed 5% were indeed randomly selected. We changed the text in Methods accordingly to read: “An additional 75 randomly selected mitral cells were inhibited” 

      P. 4, L. 1-2: "... sparsely connected integrate-and-fire neurons with conductance-based synapses (connection probability {less than or equal to}5%)." 

      Specify the connection probability of specific subtypes (EE, EI, IE, II).  

      We now refer to the Methods section, where this information can be found. 

      “... conductance-based synapses (connection probability ≤5%, Methods)”  

      P. 4, L. 6-7: "Population activity was odor-specific and activity patterns evoked by uncorrelated OB inputs remained uncorrelated in Dp (Figure 1H)" 

      What would happen to correlated OB inputs (e.g. as a result of mixture of two overlapping odours) in this baseline state of the network (before memories being introduced to it)? It would be good to know this, as it sheds light on the initial operating regime of the network in terms of E/I balance and decorrelation of inputs.  

      This information was present in the original manuscript at (Figure 3) but we improved the writing to further clarify this issue: “ (…) we morphed a novel odor into a learned odor (Figure 3A), or a learned odor into another learned odor (Supplementary Figure 3B), and quantified the similarity between morphed and learned odors by the Pearson correlation of the OB activity patterns (input correlation). We then compared input correlations to the corresponding pattern correlations among E neurons in Dp (output correlation). In rand networks, output correlations increased linearly with input correlations but did not exceed them (Figure 3B and Supplementary Figure 3B)”

      P. 4, L. 12-13: "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, .."   Where is this shown? 

      (There are other occasions too in the paper where references to the supporting figures are missing). 

      We now provide the statistics: “Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20”

      P. 4: "In each network, we created 15 assemblies representing uncorrelated odors. As a consequence, ~30% of E neurons were part of an assembly ..." 

      15 x 100 / 4000 = 37.5% - so it's closer to 40% than 30%. Unless there is some overlap? 

      Yes: despite odors being uncorrelated and connectivity being random, some neurons (6 % of E neurons) belong to more than one assembly.

      P. 4: "When a reached a critical value of ~6, networks became unstable and generated runaway activity (Figure 2B)." 

      Can this transition point be calculated or estimated from the network parameters, and linked to the underlying mechanisms causing it? 

      We thank the reviewer for this interesting question. The unstability arises when inhibitions fails to counterbalance efficiently the increased recurrent excitation within Dp. The transition point is difficult to estimate, as it can depend on several parameters, including the probability of E to E connections, their strength, assembly size, and others. We have therefore not attempted to estimate it analytically.

      P. 4: "Hence, non-specific scaling of inhibition resulted in a divergence of firing rates that exhausted the dynamic range of individual neurons in the population, implying that homeostatic   global inhibition is insufficient to maintain a stable firing rate distribution." 

      I don't think this is justified based on the results and figures presented here (Fig. 2E) - the interpretation is a bit strong and biased towards the conclusions the authors want to draw. 

      To more clearly illustrate the finding that in Scaled networks, assembly neurons are highly active (close to maximal realistic firing rates) whereas non-assembly neurons are nearly silent we have now added Supplementary Fig. 2B. Moreover, we have toned down the text: “Hence, non-specific scaling of inhibition resulted in a large and biologically unrealistic divergence of firing rates (Supplementary Figure 2B) that nearly exhausted the dynamic range of individual neurons in the population, indicating that homeostatic global inhibition is insufficient to maintain a stable firing rate distribution”

      P. 5, third paragraph: Description of Figure 2I, inset is needed, either in the text or caption. 

      The inset is now referred to in the text: ”we projected synaptic conductances of each neuron onto a line representing the E/I ratio expected in a balanced network (“balanced axis”) and onto an orthogonal line (“counter-balanced axis”; Figure 2I inset, Methods).”

      P. 5, last paragraph: another example of writing about results without showing/referring to the corresponding figures: 

      "In rand networks, firing rates increased after stimulus onset and rapidly returned to a low baseline after stimulus offset. Correlations between activity patterns evoked by the same odor at different time points and in different trials were positive but substantially lower than unity, indicating high variability ..." 

      And the continuation with similar lack of references on P. 6: 

      "Scaled networks responded to learned odors with persistent firing of assembly neurons and high pattern correlations across trials and time, implying attractor dynamics (Hopfield, 1982; Khona and Fiete, 2022), whereas Tuned networks exhibited transient responses and modest pattern correlations similar to rand networks." 

      Please go through the Results and fix the references to the corresponding figures on all instances. 

      We thank the reviewer for pointing out these overlooked figure references, which are now fixed.

      P. 8: "These observations further support the conclusion that E/I assemblies locally constrain neuronal dynamics onto manifolds." 

      As discussed in the general major points, mechanistic explanation in terms of how the interaction of E/I dynamics leads to this is missing. 

      As discussed in the reply to the public review (comment 3 of reviewer 1), we have now provided more mechanistic analyses of our observations.

      P. 9: "Hence, E/I assemblies enhanced the classification of inputs related to learned patterns."   The effect seems to be very small. Also, any explanation for why for low test-target correlation the effect is negative (random doing better than tuned E/I)? 

      The size of the effect (plearned – pnovel = 0.074; difference of means; Figure 5C) may appear small in terms of absolute probability, but it is substantial relative to the maximum possible increase (1 – p<sub>novel</sub> =  0.133; Figure 5C). The fact that for low test-target correlations the effect is negative is a direct consequence of the positive effect for high test-target correlations and the presence of 2 learned odors in the 4-way forced choice task. 

      P. 9: "In Scaled I networks, creating two additional memories resulted in a substantial increase   in firing rates, particularly in response to the learned and related odors"   Where is this shown? Please refer to the figure. 

      We thank the reviewer again for pointing this out. We forgot to include a reference to the relevant figure which has now been added in the revised manuscript (Figure 6C).

      P. 10: "The resulting Tuned networks reproduced additional experimental observations that were not used as constraints including irregular firing patterns, lower output than input correlations, and the absence of persistent activity" 

      It is difficult to present these as "additional experimental observations", as all of them are negative, and can exist in random networks too - hence cannot be used as biological evidence in favour of specific E/I networks when compared to random networks. 

      We agree with the reviewer that these additional experimental observations cannot be used as biological evidence favouring Tuned E+I networks over random networks. We here just wanted to point out that additional observations which we did not take into account to fit the model are not invalidating the existence of E-I assemblies in biological networks. As assemblies tend to result in persistent activity in other types of networks, we feel that this observation is worth pointing out.

      Methods: 

      P. 13: Describe the parameters of Eq. 2 after the equation. 

      Done.

      P. 13: "The time constants of inhibitory and excitatory synapses were 10 ms and 30 ms, respectively." 

      What is the (biological) justification for the choice of these parameters? 

      How would varying them affect the main results (e.g. local manifolds)? 

      We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. We have now also simulated networks with equal time constants for excitatory and inhibitory synapses and equal biophysical parameters for excitatory and inhibitory neurons, which did not affect the main results (see also reply to the public review: comment 2 of reviewer 1).

      P. 14: "Care was also taken to ensure that the variation in the number of output connections was low across neurons"   How exactly?

      More detailed explanations have now been added in the Methods section: “connections of a presynaptic neuron y to postsynaptic neurons x were randomly deleted when their total number exceeded the average number of output connections by ≥5%, or added when they were lower by ≥5%.“

      Reviewer #2 (Recommendations For The Authors): 

      Congratulations on the great and interesting work! The results were nicely presented and the idea of continuous encoding on manifolds is very interesting. To improve the quality of the paper, in addition to the major points raised in the public review, here are some more detailed comments for the paper: 

      (1) Generally, citations have to improve. Spiking networks with excitatory assemblies and different architectures of inhibitory populations have been studied before, and the claim about improved network stability in co-tuned E-I networks has been made in the following papers that need to be correctly cited: 

      • Vogels TP, Sprekeler H, Zenke F, Clopath C, Gerstner W. 2011. Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science 334:1-7. doi:10.1126/science.1212991 (mentions that emerging precise balance on the synaptic weights can result in the overall network stability) 

      • Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 (among other things, contrasts stability and competition which arises from multistable networks with global inhibition and reciprocal inhibition)   • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7 (compares different architectures of inhibition and their effects on network dynamics) 

      • Lagzi F, Fairhall A. 2022. Tuned inhibitory firing rate and connection weights as emergent network properties. bioRxiv 2022.04.12.488114. doi:10.1101/2022.04.12.488114 (here, see the eigenvalue and UMAP analysis for a network with global inhibition and E/I assemblies) 

      Additionally, there are lots of pioneering work about tracking of excitatory synaptic inputs by inhibitory populations, that are missing in references. Also, experimental work that show existence of cell assemblies in the brain are largely missing. On the other hand, some references that do not fit the focus of the statements have been incorrectly cited. 

      The authors may consider referencing the following more pertinent studies on spiking networks to support the statement regarding attractor dynamics in the first paragraph in the Introduction (the current citations of Hopfield and Kohonen are for rate-based networks): 

      • Wong, K.-F., & Wang, X.-J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4), 1314-1328. https://doi.org/10.1523/JNEUROSCI.3733-05.2006 

      • Wang, X.-J. (2008). Decision making in recurrent neuronal circuits. Neuron, 60(2), 215-234. https://doi.org/10.1016/j.neuron.2008.09.034  

      • F. Lagzi, & S. Rotter. (2015). Dynamics of competition between subnetworks of spiking neuronal networks in the balanced state. PloS One. 

      • Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3), 477-485. 

      • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7. 

      • Amit DJ, Tsodyks M (1991) Quantitative study of attractor neural network retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. Network 2:259-273. 

      • Mazzucato, L., Fontanini, A., & La Camera, G. (2015). Dynamics of Multistable States during Ongoing and Evoked Cortical Activity. Journal of Neuroscience, 35(21), 8214-8231. 

      We thank the reviewer for the references suggestions. We have carefully reviewed the reference list and made the following changes, which we hope address the reviewer’s concerns:

      (1) We adjusted References about network stability in co-tuned E-I networks.

      (2) We added the Lagzi & Rotter (2015), Amit et al. (1991), Mazzucato et al. (2015) and GoldmanRakic (1995) papers in the Introduction as studies on attractor dynamics in spiking neural networks. We preferred to omit the two X.J Wang papers, as they describe attractors in decision making rather than memory processes.

      (3) We added the Ko et al. 2011 paper as experimental evidence for assemblies in the brain. In our view, there are few experimental studies showing the existence of cell assemblies in the brain, which we distinguish from cell ensembles, group of coactive neurons. 

      (4) We also included Hennequin 2018, Brunel 2000, Lagzi et al. 2021 and Eckmann et al. 2024, which we had not cited in the initial manuscript.

      (5) We removed the Wiechert et al. 2010 reference as it does not support the statement about geometry-preserving transformation by random networks.

      (2) The gist of the paper is about how the architecture of inhibition (reciprocal vs. global in this case) can determine network stability and salient responses (related to multistable attractors and variations) for classification purposes. It would improve the narrative of the paper if this point is raised in the Introduction and Discussion section. Also see a relevant paper that addresses this point here: 

      Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 

      Classification has long been proposed to be a function of piriform cortex and autoassociative memory networks in general, and we consider it important. However, the computational function of Dp or piriform cortex is still poorly understood, and we do not focus only on odor classification as a possibility. In fact, continuous representational manifolds also support other functions such as the quantification of distance relationships of an input to previously memorized stimuli, or multi-layer network computations (including classification). In the revised manuscript, we have performed additional analyses to explore these notions in more detail, as explained above (response to public reviews, comment 3 of reviewer 1). Furthermore, we have now expanded the discussion of potential computational functions of Tuned networks and explicitly discuss classification but also other potential functions. 

      (3) A plot for the values of the inhibitory conductances in Figure 1 would complete the analysis for that section. 

      In Figure 1, we decided to only show the conductances that we use to fit our model, namely the afferent and total synaptic conductances. As the values of the inhibitory conductances can be derived from panel E, we refrained from plotting them separately for the sake of simplicity. 

      (4) How did the authors calculate correlations between activity patterns as a function of time in Figure 2E, bottom row? Does the color represent correlation coefficient (which should not be time dependent) or is it a correlation function? This should be explained in the Methods section. 

      The color represents the Pearson correlation coefficient between activity patterns within a narrow time window (100 ms). We updated the Figure legend to clarify this: “Mean correlation between activity patterns evoked by a learned odor at different time points during odor presentation. Correlation coefficients were calculated between pairs of activity vectors composed of the mean firing rates of E neurons in 100 ms time bins. Activity vectors were taken from the same or different trials, except for the diagonal, where only patterns from different trials were considered.”

      (5) Figure 3 needs more clarification (both in the main text and the figure caption). It is not clear what the axes are exactly, and why the network responses for familiar and novel inputs are different. The gray shaded area in panel B needs more explanation as well.  

      We thank the reviewer for the comment. We have improved Figure 3A, the figure caption, as well as the text (see p.6). We hope that the figure is now clearer.

      (6) The "scaled I" network, known for representing input patterns in discrete attractors, should exhibit clear separation between network responses in the 2D PC space in the PCA plots. However, Figure 4D and Figure 6D do not reflect this, as all network responses are overlapped. Can the authors explain the overlap in Figure 4D? 

      In Figure 4D, activity of Scaled networks is distributed between three subregions in state space that are separated by the first 2 PCs. Two of them indeed correspond to attractor states representing the two learned odors while the third represents inputs that are not associated with these attractor states. To clarify this, please see also the density plot in Figure 4E. The few datapoints between these three subregions are likely outliers generated by the sequential change in inputs, as described in Supplementary Figure 8C.

      (7) The reason for writing about the ISN networks is not clear. Co-tuned E-I assemblies do not necessarily have to operate in this regime. Also, the results of the paper do not rely on any of the properties of ISNs, but they are more general. Authors should either show the paradoxical effect associated with ISN (i.e., if increasing input to I neurons decreases their responses) or show ISN properties using stability analysis (See computational research conducted at the Allen Institute, namely Millman et al. 2020, eLife ). Currently, the paper reads as if being in the ISN regime is a necessary requirement, which is not true. Also, the arguments do not connect with the rest of the paper and never show up again. Since we know it is not a requirement, there is no need to have those few sentences in the Results section. Also, the choice of alpha=5.0 is extreme, and therefore, it would help to judge the biological realism if the raster plots for Figs 2-6 are shown.

      We have toned down the part on ISN and reduced it to one sentence for readers who might be interested in knowing whether activity is inhibition-stabilized or not. We have also added the reference to the Tsodyks et al. 1997 paper from which we derive our stability analysis. The text now reads “Hence, pDp<sub>sim</sub> entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b, Tsodyks et al., 1997).”  

      We have now also added the raster plots as suggested by the reviewer (see Figure 2D, Supplementary Figure 1 G, Supplementary Figure 4). We thank the reviewer for this comment.

      (8) In the abstract, authors mention "fast pattern classification" and "continual learning," but in the paper, those issues have not been addressed. The study does not include any synaptic plasticity. 

      Concerning “continual learning” we agree that we do not simulate the learning process itself. However, Figure 6 show results of a simulation where two additional patterns were stored in a network that already contained assemblies representing other odors. We consider this a crude way of exploring the end result of a “continual learning” process. “Fast pattern classification” is mentioned because activity in balanced networks can follow fluctuating inputs with high temporal resolution, while networks with stable attractor states tend to be slow. This is likely to account for the occurrence of hysteresis effects in Scaled but not Tuned networks as shown in Supplementary

      Fig. 8.

      (9) In the Introduction, the first sentence in the second paragraph reads: "... when neurons receive strong excitatory and inhibitory synaptic input ...". The word strong should be changed to "weak".

      Also, see the pioneering work of Brunel 2000. 

      In classical balanced networks, strong excitatory inputs are counterbalanced by strong inhibitory inputs, leading to a fluctuation-driven regime. We have added Brunel 2000.

      (10) In the second paragraph of the introduction, the authors refer to studies about structural co-tuning (e.g., where "precise" synaptic balance is mentioned, and Vogels et al. 2011 should be cited there) and functional co-tuning (which is, in fact, different than tracking of excitation by inhibition, but the authors refer to that as co-tuning). It makes it easier to understand which studies talk about structural co-tuning and which ones are about functional co-tuning. The paper by Znamenski 2018, which showed both structural and functional tuning in experiments, is missing here. 

      We added the citation to the now published paper by Znamenskyi et al. (2024).  

      (11) The third paragraph in the Introduction misses some references that address network dynamics that are shaped by the inhibitory architecture in E/I assemblies in spiking networks, like Rost et al 2018 and Lagzi et al 2021. 

      These references have been added.

      (12) The last sentence of the fourth paragraph in the Introduction implies that functional co-tuning is due to structural co-tuning, which is not necessarily true. While structural co-tuning results in functional co-tuning, functional co-tuning does not require structural co-tuning because it could arise from shared correlated input or heterogeneity in synaptic connections from E to I cells.  

      We generally agree with the reviewer, but we are uncertain which sentence the reviewer refers to.

      We assume the reviewer refers to the last sentence of the second (rather than the fourth paragraph), which explicitly mentions the “…structural basis of E/I co-tuning…”. If so, we consider this sentence still correct because the “structural basis” refers not specifically to E/I assemblies, but also includes any other connectivity that may produce co-tuning, including the connectivity underlying the alternative possibilities mentioned by the reviewer (shared correlated input or heterogeneity of synaptic connections).

      (13) In order to ensure that the comparison between network dynamics is legit, authors should mention up front that for all networks, the average firing rates for the excitatory cells were kept at 1 Hz, and the background input was identical for all E and I cells across different networks.

      We slightly revised the text to make this more clear “We (…) uniformly scaled I-to-E connection weights by a factor of χ until E population firing rates in response to learned odors matched the corresponding firing rates in rand networks, i.e., 1 Hz”

      (14) In the last paragraph on page 5, my understanding was that an individual odor could target different cells within an assembly in different trials to generate trial to trail variability. If this is correct, this needs to be mentioned clearly. 

      This is not correct, an odor consists of 150 activated mitral cells with defined firing rates. As now mentioned in the Methods, “Spikes were then generated from a Poisson distribution, and this process was repeated to create trial-to-trial variability.”

      (15) The last paragraph on page 6 mentions that the four OB activity patterns were uncorrelated but if they were designed as in Figure 4A, dues to the existing overlap between the patterns, they cannot be uncorrelated. 

      This appears to be a misunderstanding. We mention in the text (and show in Figure 4B) that the four odors which “… were assigned to the corners of a square…” are uncorrelated.  The intermediate odors are of course not uncorrelated. We slightly modified the corresponding paragraph (now on page 7) to clarify this: “The subspace consisted of a set of OB activity patterns representing four uncorrelated pure odors and mixtures of these pure odors. Pure odors were assigned to the corners of a square and mixtures were generated by selecting active mitral cells from each of the pure odors with probabilities depending on the relative distances from the corners (Figure 4A, Methods).”

      (16) The notion of "learned" and "novel" odors may be misleading as there was no plasticity in the network to acquire an input representation. It would be beneficial for the authors to clarify that by "learned," they imply the presence of the corresponding E assembly for the odor in the network, with the input solely impacting that assembly. Conversely, for "novel" inputs, the input does not target a predefined assembly. In Figure 2 and Figure 4, it would be especially helpful to have the spiking raster plots of some sample E and I cells.  

      As suggested by the reviewer, we have modified the existing spiking raster plots in Figure 2, such that they include examples of responses to both learned and novel odors. We added spiking raster plots showing responses of I neurons to the same odors in Supplementary Figure 1F, as well as spiking raster plots of E neurons in Supplementary Figure 4A. To clarify the usage of “learned” and “novel”, we have added a sentence in the Results section: “We thus refer to an odor as “learned” when a network contains a corresponding assembly, and as “novel” when no such assembly is present.”.

      (17) In the last paragraph of page 8, can the authors explain where the asymmetry comes from? 

      As mentioned in the text, the asymmetry comes from the difference in the covariance structure of different classes. To clarify, we have rephrased the sentence defining the Mahalanobis distance: 

      “This measure quantifies the distance between the pattern and the class center, taking into account covariation of neuronal activity within the class. In bidirectional comparisons between patterns from different classes, the mean dM may be asymmetric if neural covariance differs between classes.”

      (18) The first paragraph of page 9: random networks are not expected to perform pattern classification, but just pattern representation. It would have been better if the authors compared Scaled I network with E/I co-tuned network. Regardless of the expected poorer performance of the E/I co-tuned networks, the result would have been interesting. 

      Please see our reply to the public review (reviewer 2).

      (19) Second paragraph on page 9, the authors should provide statistical significance test analysis for the statement "... was significantly higher ...". 

      We have performed a Wilcoxon signed-rank test, and reported the p-value in the revised manuscript (p < 0.01). 

      (20) The last sentence in the first paragraph on page 11 is not clear. What do the authors mean by "linearize input-output functions", and how does it support their claim? 

      We have now amended this sentence to clarify what we mean: “…linearize the relationship between the mean input and output firing rates of neuronal populations…”.

      (21) In the first sentence of the last paragraph on page 11, the authors mentioned “high variability”, but it is not clear compared with which of the other 3 networks they observed high variability.

      Structurally co-tuned E/I networks are expected to diminish network-level variability. 

      “High variability” refers to the variability of spike trains, which is now mentioned explicity in the text. We hope this more precise statement clarifies this point.

      (22) Methods section, page 14: "firing rates decreased with a time constant of 1, 2 or 4 s". How did they decrease? Was it an implementation algorithm? The time scale of input presentation is 2 s and it overlaps with the decay time constant (particularly with the one with 4 s decrease).  

      Firing rates decreased exponentially. We have added this information in the Methods section.

      Reviewer #3 (Recommendations For The Authors): 

      In the following, I suggest minor corrections to each section which I believe can improve the manuscript. 

      - There was no github link to the code in the manuscript. The code should be made available with a link to github in the final manuscript. 

      The code can be found here: https://github.com/clairemb90/pDp-model. The link has been added in the Methods section.

      Figure 1: 

      - Fig. 1A: call it pDp not Dp. Please check if this name is consistent in every figure and the text. 

      Thank you for catching this. Now corrected in Figure 1, Figure 2 and in the text.

      - The authors write: "Hence, pDpsim entered an inhibition-stabilized balanced state (Sadeh and Clopath, 2020b) during odor stimulation (Figure 1D, E)." and then later "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, demonstrating that activity was indeed inhibition-stabilized. These results were robust against parameter variations (Methods)." I would suggest moving the second sentence before the first sentence, because the fact that the network is in the ISN regime follows from the shuffled spike timing result. 

      Also, I'd suggest showing this as a supplementary figure. 

      We thank the reviewer for this comment. We have removed “inhibition-stabilized” in the first sentence as there is no strong evidence of this in Rupprecht and Friedrich, 2018. And removed “indeed” in the second sentence. We also provided more detailed statistics. The text now reads “Hence, pDpsim entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b).”

      Figure 2: 

      - "... Scaled I networks (Figure 2H." Missing ) 

      Corrected.

      - The authors write "Unlike in Scaled I networks, mean firing rates evoked by novel odors were indistinguishable from those evoked by learned odors and from mean firing rates in rand networks (Figure 2F)." 

      Why is this something you want to see? Isn't it that novel stimuli usually lead to high responses? Eg in the paper Schulz et al., 2021 (eLife) which is also cited by the authors it is shown that novel responses have high onset firing rates. I suggest clarifying this (same in the context of Fig. 3C). 

      In Dp and piriform cortex, firing rates evoked by learned odors are not substantially different from firing rates evoked by novel odors. While small differences between responses to learned versus novel odors cannot be excluded, substantial learning-related differences in firing rates, as observed in other brain areas, have not been described in Dp or piriform cortex. We added references in the last paragraph of p.5. Note that the paper by Schulz et al. (2021) models a different type of circuit.  

      - Fig. 2B: Indicate in figure caption that this is the case "Scaled I" 

      This is not exactly the case “Scaled I”, as the parameter 𝝌𝝌 (increased I to E strength) is set to 1.

      - Suppl Fig. 2I: Is E&F ever used in the manuscript? I couldn't find a reference. I suggest removing it if not needed. 

      Suppl. Fig 2I E&F is now Suppl Fig.1G&H. We now refer to it in the text: “Activity of networks with E assemblies could not be stabilized around 1 Hz by increasing connectivity from subsets of I neurons receiving dense feed-forward input from activated mitral cells (Supplementary Figure 1GH; Sadeh and Clopath, 2020).”

      Figure 3: 

      - As mentioned in my comment in the public review section, I find the arguments about pattern completion a little bit confusing. For me it's not clear why an increase of output correlations over input correlations is considered "pattern completion" (this is not to say that I don't find the nonlinear increase of output correlations interesting). For me, to test pattern completion with second-order statistics one would need to do a similar separation as in Suppl Fig. 3, ie measuring the pairwise correlation at cells in the assembly L that get direct input from L OB with cells in the assembly L that do not get direct input from OB. If the pairwise correlations of assembly cells which do not get direct input from OB increase in correlations, I would consider this as pattern completion (similar to the argument that increase in firing rate in cells which are not directly driven by OB are considered a sign of pattern completion). 

      Also, for me it now seems like that there are contradictory results, in Fig. 3 only Scaled I can lead to pattern completion while in the context of Suppl. Fig. 3 the authors write "We found that assemblies were recruited by partial inputs in all structured pDpsim networks (Scaled and Tuned) without a significant increase in the overall population activity (Supplementary Figure 3A)."   I suggest clarifying what the authors exactly mean by pattern completion, why the increase of output correlations above input correlations can be considered as pattern completion, and why the results differs when looking at firing rates versus correlations. 

      Please see our reply to the public review (reviewer 3).

      - I actually would suggest adding Suppl. Fig. 3 to the main figure. It shows a more intuitive form of pattern completion and in the text there is a lot of back and forth between Fig. 3 and Suppl. Fig. 3 

      We feel that the additional explanations and panels in Fig.3 should clarify this issue and therefore prefer to keep Supplementary Figure 3 as part of the Supplementary Figures for simplicity.  

      - In the whole section "We next explored effects of assemblies ... prevented strong recurrent amplification within E/I assemblies." the authors could provide a link to the respective panel in Fig. 2 after each statement. This would help the reader follow your arguments. 

      We thank the reviewer for pointing this out. The references to the appropriate panels have been added. 

      - Fig. 3A: I guess the x-axis has been shifted upwards? Should be at zero. 

      We have modified the x-axis to make it consistent with panels B and C.  

      - Fig. 3B: In the figure caption, the dotted line is described as the novel odor but it is actually the unit line. The dashed lines represent the reference to the novel odor. 

      Fixed.

      - Fig. 3C: The " is missing for Pseudo-Assembly N

      Fixed.

      - "...or a learned odor into another learned odor." Have here a ref to the Supplementary Figure 3B.

      Added.

      Figure 4:   

      - "This geometry was largely maintained in the output of rand networks, consistent with the notion that random networks tend to preserve similarity relationships between input patterns (Babadi and Sompolinsky, 2014; Marr, 1969; Schaffer et al., 2018; Wiechert et al., 2010)." I suggest adding here reference to Fig. 4D (left). 

      Added.

      - Please add a definition of E/I assemblies. How do the authors define E/I assemblies? I think they consider both, Tuned I and Tuned E+I as E/I assemblies? In Suppl. Fig. 2I E it looks like tuned feedforward input is defined as E/I assemblies. 

      We thank the reviewer for pointing this out. E/I assemblies are groups of E and I neurons with enhanced connectivity. In other words, in E/I assemblies, connectivity is enhanced not only between subsets of E neurons, but also between these E neurons and a subset of I neurons. This is now clarified in the text: “We first selected the 25 I neurons that received the largest number of connections from the 100 E neurons of an assembly. To generate E/I assemblies, the connectivity between these two sets of neurons was then enhanced by two procedures.”. We removed “E/I assemblies” in Suppl. Fig.2, where the term was not used correctly, and apologize for the confusion.

      - Suppl. Fig. 4: Could the authors please define what they mean by "Loadings" 

      The loadings indicate the contribution of each neuron to each principal component, see adjusted legend of Suppl. Fig. 4: “G. Loading plot: contribution of neurons to the first two PCs of a rand and a Tuned E+I network (Figure 4D).”

      - Fig. 4F: The authors might want to normalize the participation ratio by the number of neurons (see e.g. Dahmen et al., 2023 bioRxiv, "relative PR"), so the PR is bound between 0 and 1 and the dependence on N is removed. 

      We thank the reviewer for the suggestion, but we prefer to use the non-normalized PR as we find it more easily interpretable (e.g. number of attractor states in Scaled networks).

      - Fig. 4G&H: as mentioned in the public review, I'd add the case of Scaled I to be able to compare it to the Tuned E+I case. 

      As already mentioned in the public review, we thank the reviewer for this suggestion, which we have implemented.

      - Figure caption Fig. 4H "Similar results were obtained in the full-dimensional space." I suggest showing this as a supplemental panel. 

      Since this only adds little information, we have chosen not to include it as a supplemental panel to avoid overloading the paper with figures.

      Figure 5: 

      - As mentioned in the public review, I suggest that the authors add the Scaled I case to Fig. 5 (it's shown in all figures and also in Fig. 6 again). I guess for Scaled I the separation between L and M will be very good? 

      Please see our reply to the public review (reviewer 3).

      - Fig. 5A&B: I am a bit confused about which neurons are drawn to calculate the Mahalanobis distance. In Fig. 5A, the schematic indicates that the vector B from which the neurons are drawn is distinct from the distribution Q. For the example of odor L, the distribution Q consists of pure odor L with odors that have little mixtures with the other odors. But the vector v for odor L seems to be drawn only from odors that have slightly higher mixtures (as shown in the schematic in Fig. 5A). Is there a reason to choose the vector v from different odors than the distribution Q? 

      The distribution Q and the vector v consist of activity patterns across the same neurons in response to different odors. The reason to choose a different odor for v was to avoid having this test datapoint being included in the distribution Q. We also wanted Q to be the same for all test datapoints. 

      What does "drawn from whole population" mean? Does this mean that the vectors are drawn from any neuron in pDp? If yes, then I don't understand how the authors can distinguish between different odors (L,M,O,N) on the y-axis. Or does "whole population" mean that the vector is drawn across all assemblies as shown in the schematic in Fig. 5A and the case "neurons drawn from (pseudo-) assembly" means that the authors choose only one specific assembly? In any case, the description here is a bit confusing, I think it would help the reader to clarify those terms better.  

      Yes, “drawn from whole population” means that we randomly draw 80 neurons from the 4000 E neurons in pDp. The y-axis means that we use the activity patterns of these neurons evoked by one of the 4 odors (L, M, N, O) as reference. We have modified the Figure legend to clarify this: “d<sub>M</sub> was computed based on the activity patterns of 80 E neurons drawn from the four (pseudo-) assemblies (top) or from the whole population of 4000 E neurons (bottom). Average of 50 draws.”

      - Suppl Fig. 5A: In the schematic the distance is called d_E(\bar{Q},\bar{V}) while the colorbar has d_E(\bar{Q},\bar{Q}) with the Qs in different color. The green Q should be a V. 

      We thank the reviewer for spotting this mistake, it is now fixed.

      - Fig. 5: Could the authors comment on the fact that a random network seems to be very good in classifying patterns on it's own. Maybe in the Discussion? 

      The task shown in Figure 5 is a relatively easy one, a forced-choice between four classes which are uncorrelated. In Supplementary Figure 9, we now show classification for correlated classes, which is already much harder.

      Figure 6: 

      - Is the correlation induced by creating mixtures like in the other Figures? Please clarify how the correlations were induced. 

      We clarified this point in the Methods section: “The pixel at each vertex corresponded to one pure odor with 150 activated and 75 inhibited mitral cells (…) and the remaining pixels corresponded to mixtures. In the case of correlated pure odors (Figure 6), adjacent pure odors shared half of their activated and half of their inhibited cells.”. An explicit reference to the Methods section has also been added to the figure legend.

      - Fig. 6C (right): why don't we see the clear separation in PC space as shown in Fig. 4? Is this related to the existence of correlations? Please clarify. 

      Yes. The assemblies corresponding to the correlated odors X and Y overlap significantly, and therefore responses to these odors cannot be well separated, especially for Scaled networks. We added the overlap quantification in the Results section to make this clear. “These two additional assemblies had on average 16% of neurons in common due to the similarity of the odors.”

      - "Furthermore, in this regime of higher pattern similarity, dM was again increased upon learning, particularly between learned odors and reference classes representing other odors (not shown)." Please show this (maybe as a supplemental figure). 

      We now show the data in Supplementary Figure 9.

      Discussion: 

      - The authors write: "We found that transformations became more discrete map-like when amplification within assemblies was increased and precision of synaptic balance was reduced. Likewise, decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (not shown)." 

      Where do I see the first point? I guess when I compare in Fig. 4D the case of Scaled I vs Tuned E+I, but the sentence above sounds like the authors showed this in a more step-wise way eg by changing the strength of \alpha or \beta (as defined in Fig. 1). 

      Also I think if the authors want to make the point that decreasing amplification in assemblies changes transformation with a different rate distribution in scaled vs tuned networks, the authors should show it (eg adding a supplemental figure). 

      The first point is indeed supported by data from different figures. Please note that the revised manuscript now contains further simulations that reinforce this statement, particularly those shown in Supplementary Figure 6, and that this point is now discussed more extensively in the Discussion. We hope that these revisions clarify this general point.

      The data showing effects of decreasing amplification in assemblies is now shown in Supplementary Figure 6 (Scaled[adjust])

      - I suggest adding the citation Znamenskiy et al., 2024 (Neuron; https://doi.org/10.1016/j.neuron.2023.12.013), which shows that excitatory and inhibitory (PV) neurons with functional similarities are indeed strongly connected in mouse V1, suggesting the existence of E/I assembly structure also in mammals.

      Done.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost importance and a reproducible setup essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

      Weaknesses:

      The authors stress that it can be done in BBB models, but I would argue that it is much more broadly applicable. This is not necessarily a weakness of the study but more an opportunity to strengthen the method. So I would encourage the authors to rewrite some parts and make it more broadly applicable.

      We thank the Reviewer for this suggestion. The barrier properties of the BBB influence the dynamic behavior of T cells during their multi-step extravasation cascade. The crawling of CD4 T cells against the direction of blood-flow is e.g. a unique behavior of T cells on the BBB  that is also observed in vivo(1-3). Nevertheless we fully agree that in principle UFMTrack is usable for studying in general immune cell interactions with endothelial monolayers under physiological flow. We have thus added a statement in the abstract and expanded the discussion to highlight availability of the framework and the potential necessary adaptations required when using UFMTrack for analyzing different experimental setups. Please also note, UFMTrack has been established as basic framework using the example of brain endothelial monolayers and one flow chamber devices while studying different immune cell subsets. The purpose of the publication is to make UFMTrack available to the community to address their specific questions.

      (1) Kawakami, N., Bartholomäus, I., Pesic, M. & Kyratsous, N. I. Intravital Imaging of Autoreactive T Cells in Living Animals. Methods Cell Biol. 113, 149–168 (2013).

      (2) Schläger, C., Litke, T., Flügel, A. & Odoardi, F. In Vivo Visualization of (Auto)Immune Processes in the Central Nervous System of Rodents. in 117–129 (Humana Press, New York, NY, 2014). doi:10.1007/7651_2014_150

      (3) Haghayegh Jahromi, N. et al. Intercellular Adhesion Molecule-1 (ICAM-1) and ICAM-2 Differentially Contribute to Peripheral Activation and CNS Entry of Autoaggressive Th1 and Th17 Cells in Experimental Autoimmune Encephalomyelitis. Front. Immunol. 10, 3056 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Developing a reliable method to record ancestry and distinguish between human somatic cells presents significant challenges. I fully acknowledge that my current evidence supporting the claim of lineage tracing with fCpG barcodes is inadequate. I agree with Reviewer 1 that fCpG barcodes are essentially a cellular division clock that diverges over time. A division clock could potentially document when cells cease to divide during development, with immediate daughter cells likely exhibiting more similar barcodes than those that are less related. Although it remains uncertain whether the current fCpG barcodes capture useful biological information, refinement of this type of tool could complement other approaches that reconstruct human brain function, development, and aging.

      Due to my lack of clarity, the fCpG barcode was perceived to be a new type of cell classifier. However, it is fundamentally different. fCpG sites are selected based on their differences between cells of the same type, while traditional cell classifiers focus on sites with consistent methylation patterns in cells of the same type. Despite these opposing criteria, fCpG barcodes and traditional cell classifiers may align because neuron subtypes often share common progenitors. As a result, cells of the same phenotype are also closely related by ancestry, and ex post facto, have similar fCpG barcodes. fCpG barcodes are complementary to cell type classifiers, and potentially provide insights into aspects such as mitotic ages, diversity within a clade, and migration of immediate daughters---information which is otherwise difficult to obtain. The title has been modified to “Human Brain Ancestral Barcodes” to better reflect the function of the fCpG barcodes. The manuscript is edited to correct errors, and a new Supplement is added to further explain fCpG barcode mechanics and present new supporting data.

      Reviewer #1 (Public review):

      I thank Reviewer 1 for his constructive comments. Major noted weaknesses were 1) insufficient clarity and brevity of the methodology, 2) inconsistent or erroneous use of neurodevelopmental concepts, and 3) lack of consideration for alternative explanations.

      (1) The methodology is now outlined in detailed in a new Supplement, including simulations that indicate that the error rate consistent with the experimental data is about 0.01 changes in methylation per fCpG site per division.

      (2) Conceptual and terminology errors noted by the Reviewers are corrected in the manuscript.

      (3) I agree completely with the alternative explanation of Reviewer 1 that fCpGs are “a cellular division clock that diverges over 'time'”. Differences between more traditional cell type classifiers and fCpG barcodes are more fully outlined in the new Supplement.  Ancestry recorded by fCpGs and cell type classifiers are confounded because cells of the same phenotype typically have common progenitors---cells within a clade have similar fCpG barcodes because they are closely related. fCpG barcodes can compliment cell type classifiers with additional information such as mitotic ages, ancestry within a clade, and daughter cell migration.

      Reviewer #1 (Recommendations for the authors):

      (1) A lot of the interpretations suffer from an extremely loose/erroneous use of developmental concepts and a lack of transparency. For instance:

      a) The thalamus is not part of the brain stem

      Corrected.

      b) The pons contains cells other than inhibitory neurons in the data; the same is true for the hippocampus which contains multiple cell types

      Corrected to refer to the specific cell types in these regions.

      c) The author talks about the rostral-caudal timing a lot which is not really discussed to this degree in the cited references. Thus, it is also unclear how interneurons fit in this model as they are distinguished by a ventral-dorsal difference from excitatory neurons. Also, it is unclear whether the timing is really as distinct as claimed. For instance, inhibitory neurons and excitatory neurons significantly overlap in their birth timing. Finally, conceptually, it does not make sense to go by developmental timing as the author proposes that it is the number of divisions that is relevant. While they are somewhat correlated there are potentially stark differences.

      The manuscript attempts to describe what might be broadly expected when barcodes are sampled from different cell types and locations. As a proposed mitotic clock, the fCpG barcode methylation level could time when each neuron ceased division and differentiated. The wide ranges of fCpG barcode methylation of each cell type (Fig 2A) would be consistent with significant overlap between cell types. The manuscript is edited to emphasize overlapping rather than distinct sequential differentiation of the cell types.

      d) Neocortical astrocytes and some oligodendrocytes share a lineage, whereas a subset of oligodendrocytes in the cortex shares an origin with interneurons. This could confound results but is never discussed.

      The manuscript does not assess glial lineages in detail because neurons were preferentially included in the sampling whereas glial cells were non-systematically excluded. This sampling information is now included in the section “fCpG barcode identification”.

      e) Neocortical interneurons should be more closely related in terms of lineage-to-excitatory neurons than other inhibitory neurons of, for instance, the pons. This is not clearly discussed and delineated.

      This is not discussed. It may not be possible analyze these details with the current data. The ancestral tree reconstructions indicate that excitatory neurons that appear earlier in development (and are more methylated) are more often more closely related to inhibitory neurons.

      f) While there is some spread of excitatory neurons tangentially, there is no tangential migration at the scale of interneurons as (somewhat) suggested/implied here.

      The abstract and results have been modified to indicate greater inhibitory than excitatory neuron tangential migration, but that the extent of excitatory neuron tangential migration cannot be determined because of the sparse sampling and that barcodes may be similar by chance.

      g) The nature of the NN cells is quite important as cells not derived from the neocortical anlage are unlikely to share a developmental origin (e.g., microglia, endothelial cells). This should be clarified and clearly stated.

      The manuscript is modified to indicate that NN cells are microglial and endothelial cells. These cells have different developmental origins, and their data are present in Fig 2A, but are not further used for ancestral analysis.  

      (2) The presentation is often somewhat confusing to me and lacks detail. For instance:

      a) The methods are extremely short and I was unable to find a reference for a full pipeline, so other researchers can replicate the work and learn how to use the pipeline.

      The pipeline including python code is outlined in the new Supplement

      b) Often numbers are given as ~XX when the actual number with some indication of confidence or spread would be more appropriate.

      Data ranges are often indicated with the violin plots.

      c) Many figure legends are exceedingly short and do not provide an appropriate level of detail.

      Figure legends have been modified to include more detail

      d) Not defining groups in the figure legends or a table is quite unacceptable to me. I do not think that referring to a prior publication (that does not consistently use these groups anyway) is sufficient.

      The cell groups are based on the annotations provided with each single cell in the public databases.

      e) The used data should be better defined and introduced (number of cells, different subtypes across areas, which cells were excluded; I assume the latter as pons and hippocampus are only mentioned for one type of neuronal cells, see also above).

      The data used are present in Supplemental File 2 under the tab “cell summary H01, H02, H04”.

      f) Why were different upper bounds used for filtering for H01 and H02, and H04 is not mentioned? Why are inhibitory and excitatory neurons specifically mentioned (Lines 61-66)?

      The filtering is used to eliminate, as much as possible, cell type specific methylation, or CpG sites with skewed neuron methylation. The filtering eliminates CpG sites with high or low methylation within each of the three brains, and within the two major neuron subtypes. The goal is to enrich for CpG sites with polymorphic but not cell type specific methylation. This process is ad hoc as success criteria are currently uncertain. The extent of filtering is balanced by the need to retain sufficient numbers of fCpGs to allow comparisons between the neurons.

      g) What 'progenitor' does the author refer to? The Zygote? If yes, can the methylation status be tested directly from a zygote? There is no single progenitor for these cells other than the zygote. Does the assumption hold true when taking this into account? See, for instance, PMID 33737485 for some estimation of lineage bottlenecks.

      A brain progenitor cell can be defined as the common ancestor of all adult neurons, and is the first cell where each of its immediate daughter cell lineages yield adult neurons. The zygote is a progenitor cell to all adult cells, and barcode methylation at the start of conception, from the oocyte to the ICM, was analyzed in the new Supplement. The proposed brain progenitor cell with a fully methylated barcode was not yet evident even in the ICM.

      (3) I am generally not convinced that the fCpGs represent anything but a molecular clock of cell divisions and that many of the similarities are a function of lower division numbers where the state might be more homogenous. This mainly derives from the issues cited above, the lack of convincing evidence to the contrary, and the sparsity of the assessed data.

      Agree that the fCpG barcode is a mitotic clock that becomes polymorphic with divisions. As outlined in the new Supplement, ancestry and cell type are confounded because cells of the same type typically have a common progenitor.

      a) There appears little consideration or modeling of what the ability to switch back does to the lineage reconstruction.

      fCpG methylation flipping is further analyzed and discussed in the new Supplement.

      b) None of the data convinced me that the observations cannot be explained by the aforementioned molecular clock and systematic methylation similarities of cell types due to their cell state.

      See above

      (4) Uncategorized minor issues:

      a) The author should explain concepts like 'molecular clock hypothesis' (line 27) or 'radial unit hypothesis' (line 154), as they are somewhat complex and might not be intuitive to readers.

      The molecular clock hypothesis is deleted and the radial unit hypothesis is explained in more detail in the manuscript.

      b) Line 32: '[...] replication errors are much higher compared to base replication [...]'. I think this is central to the method and should be better explained and referenced. Maybe even through a schematic, as this is a central concept for the entire manuscript.

      The fCpG barcode mechanics are better explained in the new Supplement. With simulations, the fCpG flip rate is about 0.01 per division per fCpG.

      c) Line 41: 'neonatal'. Does the author mean to say prenatal? Most of the cells discussed are postmitotic before birth.

      Corrected to prenatal.

      d) Line 96: what does 'flip' mean in this context? Please also see the comment on Figure 2C.

      Edited to “chage”

      e) Lines 134-135: I am not sure whether the author claims to provide evidence for this question, and I would be careful with claims that this work does resolve the question here.

      Have toned down claims as evidence for my analysis is currently inadequate.

      f) Lines 192-193: I disagree as the fCpGs can switch back and the current data does not convince me that this is an improvement upon mosaic mutation analysis. In my mind, the main advantage is the re-analysis of existing data and the parallel functional insights that can be obtained.

      Lineage analysis is more straightforward with DNA sequencing, but with an error rate of ~10-9 per base per division, one needs to sequence a billion base pairs to distinguish between immediate daughter cells. By contrast, with an inferred error rate of ~10-2 per fCpG per division, much less sequencing (about a million-fold less) is needed to find differences between daughter cells.

      g) Lines 208-209: I would be careful with claims of complexity resolution given many of the limitations and inherent systematic similarities, as well as the potential of fCpGs to change back to an ancestral state later in the lineage.

      Have modified the manuscript to indicate the analysis would be more challenging due to back changes.

      h) There seem to be few figures that assess phenomena across the three brains. Even when they exist there is no attempt to provide any statistical analyses to support the conclusions or permutations to assess outlier status relative to expectations.

      The analysis could be more extensive, but with only three brains, any results, like this study itself, would be rightly judged inadequate.

      Figure 2B: there appears to be a higher number of '0s' for, for instance, inhibitory neurons compared to excitatory neurons. Is that correct and worth mentioning? The changing axes scales also make it hard to assess.

      Inhibitory neurons do appear to have more unmethylated fCpGs compared to excitatory neurons, but in general, most inhibitory fCpGs are methylated with a skew to fully methylated fCpGs, consistent with the barcode starting predominately methylated and inhibitory neurons generally appearing earlier in development relative to excitatory neurons.

      j) Figure 2C: I have several issues with this. A minor one is the use of 'Glial' which, I believe, does not appear anywhere else before this, so I am unclear what this curve represents. Generally, however, I am not sure what the y-axis represents, as it is not described in the methods or figure legend. I initially thought it was the cumulative frequency, but I do not think that this squares with the data shown in B. I appreciate the overall idea of having 'earlier'/samples with fewer divisions being shifted to the left, but it is very confusing to me when I try to understand the details of the plot.

      This graph is now better described in the legend. “Glial” cells are defined as oligodendrocytes and astrocytes. Other non-neuronal cells (such a microglial cells) have now been removed from the graph.

      This graph attempts to illustrate how it may be possible to reconstruct brain development from adult neurons, assuming barcodes are mitotic clocks that become polymorphic with cell division. The X axis is “time”, and the Y axis indicates when different cell types reach their adult levels. The cartoon indicates what is visually present along the X axis during development--- brainstem, then ganglionic eminences with a thin cortex, and finally the mature brain with a robust cortex. Time for the X axis is barcode methylation and starts at 100% and ends at 50% or greater methylation. The fCpG barcode methylation of each cell places it on this timeline and indicates when it ceased dividing and differentiated.

      The Y axis indicates the progressive accumulation of the final adult contents of each cell type during this timeline. Early in development, the brain is rudimentary and adult cells are absent. At 90% methylation, only the inhibitory neurons in the pons are present. At 80% methylation, some excitatory neurons are beginning to appear. Inhibitory neurons in the pons have reached their final adult levels and many other inhibitory neuron types are reaching adult levels. By 70% methylation, most inhibitory neurons have reached their adult levels, and more adult excitatory neurons (mainly low cortical neurons, L4-6) and glial cells are beginning to appear. By 60% methylation, inhibitory neurogenesis has largely finished. Adult excitatory neurons and glial cells are more abundant and reach their adult levels by 50% or greater cell barcode methylation levels.

      The graph illustrates a rough alignment between mitotic ages inferred by barcode methylation levels and the physical appearances of different neuronal types during development. Many neurons die during development, and this graph, if valid, indicates when neurons that survive to adulthood appear during development.

      k) Figure 4Bff: it is confusing to me that the text jumps to these panels after introducing Figure 5. This makes it very hard to read this section of the text.

      The Figures appear in the order they are first referred to in the text.

      l) Figure 5A: could any of this difference be explained by the shared lineage of excitatory neurons and dorsal neocortical glia?

      Not sure

      m) Figure 5B: after stating that interneurons have a higher lineage fidelity, the figure legend here states the opposite and I am somewhat confused by this statement.

      The legend and text have been clarified. Fig 5A restricts fidelity to within inhibitory cell types. Fig 5B compares between neuron subtypes, and illustrates more apparent inhibitory subtype switching, albeit there are more interneuron subtypes than excitatory subtypes.

      n) Figure 5E: generally, the use of tSNE for large pairwise distance analysis is often frowned upon (e.g., PMID 37590228), and I would reconsider this argument.

      This analysis was an attempt to illustrate that cells of the same phenotype based on their tSNE metrics can be either closely or more distantly related. Although the tSNE comparisons were restricted to subtypes (and not to the entire tSNE graph), tSNE are not designed for such comparisons. This graph and discussion are deleted. 

      Reviewer #2 (Public review):

      The manuscript by Shibata proposed a potentially interesting idea that variation in methylcytosine across cells can inform cellular lineage in a way similar to single nucleotide variants (SNVs). The work builds on the hypothesis that the "replication" of methylcytosine, presumably by DNMT1, is inaccurate and produces stochastic methylation variants that are inherited in a cellular lineage. Although this notion can be correct to some extent, it does not account for other mechanisms that modulate methylcytosines, such as active gain of methylation mediated by DNMT3A/B activity and activity demethylation mediated by TET activity. In some cases, it is known that the modulation of methylation is targeted by sequence-specific transcription factors. In other words, inaccurate DNMT1 activity is only one of the many potential ways that can lead to methylation variants, which fundamentally weakens the hypothesis that methylation variants can serve as a reliable lineage marker. With that being said (being skeptical of the fundamental hypothesis), I want to be as open-minded as possible and try to propose some specific analyses that might better convince me that the author is correct. However, I suspect that the concept of methylation-based lineage tracing cannot be validated without some kind of lineage tracing experiment, which has been successfully demonstrated for scRNA-seq profiling but not yet for methylation profiling (one example is Delgado et al., nature. 2022).

      I thank Reviewer 2 for the careful evaluation. The validation experiment example (Delgado et al.) introduced sequence barcodes in mice, which is not generally feasible for human studies.

      (1) The manuscript reported that fCpG sites are predominantly intergenic. The author should also score the overlap between fCpG sites and putative regulatory elements and report p-values. If fCpG sites commonly overlap with regulatory elements, that would increase the possibility that these sites being actively regulated by enhancer mechanisms other than maintenance methyltransferase activity.

      As mentioned for Reviewer 1, fCpGs are filtered to eliminate cell type specific methylation.

      (2) The overlap between fCpG and regulatory sequence is a major alternative explanation for many of the observations regarding the effectiveness of using fCpG sites to classify cell types correctly. One would expect the methylation level of thousands of enhancers to be quite effective in distinguishing cell types based on the published single-cell brain methylome works.

      As mentioned above, the manuscript did not clearly indicate that the fCpG barcode is not a cell type classifier. The distinctions between fCpG barcodes and cell type classifiers are better explained in the new Supplement.

      (3) The methylation level of fCpG sites is higher in hindbrain structures and lower in forebrain regions. This observation was interpreted as the hindbrain being the "root" of the methylation barcodes and, through "progressive demethylation" produced the methylation states in the forebrain. This interpretation does not match what is known about methylation dynamics in mammalian brains, in particular, there is no data supporting the process of "progressive demethylation". In fact, it is known that with the activation of DNMT3A during early postnatal development in mice or humans (Lister et al., 2013. Science), there is a global gain of methylation in both CH and CG contexts. This is part of the broader issue I see in this manuscript, which is that the model might be correct if "inaccurate mC replication" is the only force that drives methylation dynamics. But in reality, active enzymatic processes such as the activation of DNMT3A have a global impact on the methylome, and it is unclear if any signature for "inaccurate mC replication" survives the de novo methylation wave caused by DNMT3A activity.

      Reviewer 2 highlights a critical potential flaw in that any ancestral signal recorded by random replication errors could be overwritten by other active methylation processes. I cannot present data that indicates fCpG replication errors are never overwritten, but new data indicate barcode reproducibility and stability with aging.

      New data are also present where barcodes are compared between daughter cells (zygote to ICM) in the setting of active and passive demethylation, when germline methylation is erased. This new analysis shows that daughter cells in 2 to 8 cell embryos have more related barcodes than morula or ICM cells. The subsequent active remethylation by a wave of DNMT3A activity may underlie the observation that the barcode appears to start predominately methylated in brain progenitors.

      (3) Perhaps one way the author could address comment 3 is to analyze methylome data across several developmental stages in the same brain region, to first establish that the signal of "inaccurate mC replication" is robust and does not get erased during early postnatal development when DNMT3A deposits a large amount of de novo methylation.

      See above

      (4) The hypothesis that methylation barcodes are homogeneous among progenitor cells and more polymorphic in derived cells is an interesting one. However, in this study, the observation was likely an artifact caused by the more granular cell types in the brain stem, intermediate granularity in inhibitory cells, and highly continuous cell types in cortical excitatory cells. So, in other words, single-cell studies typically classify hindbrain cell types that are more homogenous, and cortical excitatory cells that are much more heterogeneous. The difference in cell type granularity across brain structures is documented in several whole-brain atlas papers such as Yao et al. 2023 Nature part of the BICCN paper package.

      As noted above, fCpG barcode polymorphisms and cell type differentiation are confounded because cells of the same phenotype tend to have common progenitors. The fCpG barcode is not a cell type classifier but more a cell division clock that becomes polymorphic with time. Although fCpG barcodes could be more polymorphic in cortical excitatory cells because there are many more types, fCpG barcodes would inherently become more polymorphic in excitatory cells because they appear later in development.

      (5) As discussed in comment 2, the author needs to assess whether the successful classification of cell types (brain lineage) using fCpG was, in fact, driven by fCpG sites overlapping with cell-type specific regulatory elements.

      Although unclear in the manuscript, the fCpG is not a cell classifier and the barcode is polymorphic between cells of the same type. fCpG barcodes can appear to be cell classifiers because cell types appear at different times during development, and therefore different cell types have characteristic average barcode methylation levels.

      (6) In Figure 5E, the author tried to address the question of whether methylation barcodes inform lineage or post-mitotic methylation remodeling. The Y-axis corresponds to distances in tSNE. However, tSNE involves non-linear scaling, and the distances cannot be interpreted as biological distances. PCA distances or other types of distances computed from high-dimensional data would be more appropriate.

      The Figure and discussion are deleted (similar comment by Reviewer 1)

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Human Brain Barcodes", the author sought to use single-cell CpG methylation information to trace cell lineages in the human brain.

      Strengths:

      Tracing cell lineages in the human brain is important but technically challenging. Lineage tracing with single-cell CpG methylation would be interesting if convincing evidence exists.

      Weaknesses:

      As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. The unreliable CpG methylation status also raises the question of what the "Barcodes" refer to in the title and across this study. Barcodes should be stable in principle and not dynamic across cell generations, as defined in Reference#1. It is not convincing that the "dynamic" CpG methylation fits the "barcodes" terminology. This problem is even more concerning in the last section of results, where CpG would fluctuate in post-mitotic cells.

      I thank Reviewer 3 for his thoughtful and careful evaluation. I think the “barcode” terminology is appropriate. Dynamic engineered barcodes such as CRISPR/Cas9 mutable barcodes are used in biology to record changes over time. The fCpG barcode appears to start with a single state in a progenitor cell and changes with cell division to become polymorphic in adult cells. Therefore, I think the description of a dynamic fCpG barcode is appropriate.

      Reviewer #3 (Recommendations for the authors):

      (1) As the author noted, "DNA methylation patterns are usually copied between cell division, but the replication errors are much higher compared to base replication". This unstable nature of CpG methylation would introduce significant problems in inferring the true cell lineage. To establish DNA methylation as a means for lineage tracing, one control experiment would be testing whether the DNA methylation patterns can faithfully track cell lineages for in vitro differentiated & visibly tracked cell lineages. Has this kind of experiment been done in the field?

      These types of experiments have not been performed to my knowledge and an appropriate tissue culture model is uncertain. New single cell WGBS data from the zygote to ICM indicate that more immediate daughter cells have more related barcodes even in the setting of active DNA demethylation.

      (2) The study includes assumptions that should be backed with solid rationale, supporting evidence, or reference. Here are a couple of examples:

      a) the author discarded stable CpG sites with <0.2 or >0.8 average methylation without a clear rationale in H02, and then used <0.3 and >0.7 for a specific sample H01.

      The filtering was ad hoc and was used to remove, as much as possible, CpG sites with cell type specific or patient specific methylation. CpG sites with skewed methylation are more likely cell type specific, whereas X chromosome CpG sites with methylation closer to 0.5 in male cells are more likely to be unstable. The ad hoc filtering attempted to remove cell specific CpGs sites while still retaining enough CpG sites to allow comparisons between cells.

      b) The author assumed that the early-formed brain stem would resemble progenitors better and have a higher average methylation level than the forebrain. However, this difference in DNA methylation status could reflect developmental timing or cell type-specific gene expression changes.

      This observation that brain stem neurons that appear early in development have highly methylated fCpG barcodes in all 3 brains supports the idea that the fCpG barcode starts predominately methylated. Alternative explanations are possible.

      (3) The conclusion that excitatory neurons undergo tangential migration is unclear - how far away did the author mean for the tangential direction? Lateral dispersion is known, but it would be striking that the excitatory neurons travel across different brain regions. The question is, how would the author interpret shared or divergent methylation for the same cell type across different brain regions?

      As noted with Reviewer 1, this analysis is modified to indicate that evidence of tangential migration is greater for inhibitory than excitatory neurons, but the extent of excitatory neuron migration is uncertain because of sparse sampling, and because fCpG barcodes can be similar by chance.

      (4) The sparsity and resolution of the single-cell DNA methylation data. The methylation status is detected in only a small fraction (~500/31,000 = 1.6%) of fCpGs per cell, with only 48 common sites identified between cell pairs. Given that the human genome contains over 28 million CpG sites, it is important to evaluate whether these fCpGs are truly representative. How many of these sites were considered "barcodes"?

      fCpG barcodes are distinct from traditional cell type classifiers, and how fCpGs are identified are better outlined in the new Supplement.

      (5) While focusing on the X-chromosome may simplify the identification of polymorphic fCpGs, the confidence in determining its methylation status (0 or 1) is questionable when a CpG site is covered by only one read. Did the author consider the read number of detected fCpGs in each cell when calculating methylation levels? Certain CpG sites on autosomes may also have sufficient coverage and high variability across cells, meeting the selection criteria applied to X-chromosome CpGs.

      In most cases, a fCpG site was covered by only a single read

      (6) The overall writing in the Title, the Main text, Figure legends, and Methods sections are overly simplified, making it difficult to follow. For instance, how did the author perform PWD analysis? How did they handle missing values when constructing lineage trees?

      There is not much introduction to lineage tracing in the human brain or the use of DNA methylation to trace cell lineage.

      These shortcomings are improved in the manuscript and with the new Supplement. The analysis pipeline including the Python programs are outlined and included as new Supplemental materials. IQ tree can handle the binary fCpG barcode data and skips missing values with its standard settings.

      Line 80: it is unclear: "Brain patterns were similar"

      Clarified

      Line 98: The meaning is unclear here: "Outer excitatory and glial progenitor cells are present" What are these glial progenitor cells and when/how they stop dividing?

      The glial cells are the oligodendrocytes and astrocytes. The main take away point is that these glial cells have low barcode methylation, consistent with their appearances later in development.

      Line 104: It is unclear if this is a conclusion or assumption -- "A progenitor cell barcode should become increasingly polymorphic with subsequent divisions." The "polymorphic" happens within the progenitors, their progenies, or their progenies at different time points.

      The statement is now clarified as an assumption in the manuscript.

      Similarly line 134 "Barcodes would record neuronal differentiation and migration." Is this a conclusion from this study or a citation? How is the migration part supported?

      The reasoning is better explained in the manuscript.  Migration can be documented if immediate daughter cells with similar barcodes are found in different parts of the adult brain, albeit analysis is confounded by sparse sampling and because barcodes may be similar by chance.

      Line 148 and 150: "Nearest neighbor ... neuron pairs" in DNA methylation status would conceivably reflect their cell type-specific gene expression, how did the author distinguish this from cell lineage?

      As noted above, because cells with similar phenotypes usually arise from common progenitors, cells within a clade are also usually related. However, the barcodes are still polymorphic within a clade and potentially add complementary information on mitotic ages, ancestry within a clade, and possible cell migration.

      Figure 3C: "Cells that emerge early in development" Where are they on the figure?

      Hindbrain neurons differentiate early in development and their barcodes are more methylated. The figure has been modified to label some of the values with their neuron types. Also, the older figure mistakenly included data from all 3 brains and now the data are only from brain H01.

      Figures 4D and 4E, distinguishing cell subtypes is challenging, as the same color palette is used for both excitatory and inhibitory neurons.

      Unfortunate limitations due to complexity and color limitations

      Figures 4 and 5, what are these abbreviations?

      The abbreviations are presented in Figure 1 and maintained in subsequent figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The research questions should be revised to better align with the conclusions. For example, Q2 is phrased as "Does self-relatedness bias attentional selection at the level of the perceptual feature representation (shape) or at the level of the associated identity (social association)," which is unclear in its reference to "levels." A more appropriate phrasing would be whether the self-association bias occurs automatically or whether it depends on explicit social decoding.

      Thank you for this suggestion – we have revised the phrasing accordingly: “Does self-relatedness bias attentional selection automatically or does it require explicit social decoding?”

      (2) After presenting the data, it would be helpful to include one or two sentences summarizing the conclusions drawn from the data and how they relate to the research questions. Currently, readers are left to guess whether the results are consistent with the hypotheses.

      Thank you for this suggestion, which we think will enhance the clarity of the manuscript – we have added summary sentences when presenting the results:<br /> “This cross-experimental parameter inspection revealed that participants exhibited an attentional selection bias towards socially associated information. Interestingly, enhanced processing speed was observed for other-associated rather than self-associated information, a pattern that diverged from our prediction.”

      (1) “Results from experiment 2 demonstrated a faster, more automatic attentional selection for self-associated information when the decision did not require explicit social decoding. When the social identity had to be judged, processing speed for self-associated information decreased. Contrary to the hypothesis that social decoding is necessary for self-prioritization to emerge, these findings suggest that attentional selection can operate automatically to prioritize self-associated information. “

      (2) “Taken together, as also confirmed in the cross-experimental analysis, attentional selection favoured the other-related information when social identity had to be judged. In contrast, perceptual salience, as predicted, led to increased processing speed for the more salient stimulus. “

      (3) The identity of the "other" used in the experiments is unclear, making it uncertain whether the results are self-specific. It would be beneficial to compare the self condition with a control condition, such as a close friend vs. an unfamiliar other. Alternatively, the results may reflect attentional bias for familiar vs. unfamiliar individuals rather than self-specific bias.

      Thank you for this comment. Firstly, we would like to clarify that we have provided participants with a description of who the “other” is (see methods: “At the beginning of this task, participants were told that one of the two geometric shapes that was used in the TOJ task has been assigned to them, and the other shape has been assigned to another participant in the experiment – someone they did not know, but who was of similar age and gender”). We aimed to make the ‘other’ as concrete as possible, while maintaining a ‘stranger’ identity.

      Secondly, this specification is in line with the vast majority of the literature, which typically measures the effects of self-prioritization relative to the association with an unfamiliar other (stranger), or an unfamiliar and familiar other (e.g. friend, family member). They find that processing advantages that affect friend-related stimuli (friend-stimuli being processed faster than stranger-associated stimuli) are likely mediated by self-extension, that is, an association of the friend with the self. As such, SPEs, relative to familiar others, are typically smaller in size (see, e.g., Sui et al., 2012). They, however, are less stable and more variable than the self-prioritization effects measured relative to a stranger (see Scheller & Sui, 2022 JEP:HPP). Importantly, this is driven by the variability of the friend-associated stimulus, rather than the self or other-associated stimulus (see Figure 4 in main text and S5 in supplementary material in Scheller & Sui, 2022: https://durham-repository.worktribe.com/output/1210478/the-power-of-the-self-anchoring-information-processing-across-contexts). Effectively, this would suggest that choosing a familiar other as a reference would not only (a) lead to a smaller effect size, but also (b) be a less stable effect, which likely depends on the association the individual has to the other familiar person. In contrast, by associating the other shape with another participant in this experiment, we provide participants not only with a concrete representation of a stranger, but also maximise our ability to detect true effects, as these are likely to be larger and more stable.

      (4) The key aspects of the procedure (e.g., the order of different conditions) and its rationale need to be clearly explained before or during the presentation of the results. Currently, readers are left to infer certain details.

      Thank you for pointing this out. The methods that provide these details are outlined at the end of the document, however, we agree it would be useful to bring some of these details up. We have therefore revised the methods figure (Figure 3) to include an outline of the task type, order, and trial numbers. Task boxes are colour coded by the conditions that are listed in the results figures of the manuscript. We also added these details to the caption of Figure 3.

      “Task structures of Experiments 1 and 2. Both experiments started with a TOJ baseline task. In Experiment 1, only non-salient targets were presented, while in Experiment 2, perceptually salient and non-salient trials were included. These were presented in randomly intermixed order. Next, targets were associated with social identities. Associations were practiced using the matching task. Following association learning, which attaches social salience to the shapes, participants completed the same TOJ task as before. In Experiment 1, they completed one block using a social decision dimension, and one block using a perceptual decision dimension. The order of these blocks was counterbalanced across participants to reduce the influence of order effects in the results. In Experiment 2, perceptually salient and non-salient stimuli were presented in an intermixed fashion, and participants responded within the social decision dimension. Each task block was preceded by 8 (matching) to 14 (TOJ) practice trials.”

      (5) Certain imprecise terms used to describe the results, such as "slightly," "roughly," and "loosely," create confusion for the readers. The authors should take a clearer stance on the results and provide an explanation for why the data only "slightly," "roughly," or "loosely" support the findings.

      Thank you for highlighting this. We have provided a more concrete wording and details throughout (e.g., “target shapes’ were 30% bigger than the ‘background shapes”).

      Lastly, we have updated the formatting of the manuscript to provide higher fidelity figures, which were previously compromised by file conversion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This provocative manuscript from presents valuable comparisons of the morphologies of Archaean bacterial microfossils to those of microbes transformed under environmental conditions that mimic those present on Earth during the same Eon, although the evidence in support of the conclusions is currently incomplete. The reasons include that taphonomy is not presently considered, and a greater diversity of experimental environmental conditions is not evaluated -- which is important because we ultimately do not know much about Earth's early environments. The authors may want to reframe their conclusions to reflect this work as a first step towards an interpretation of some microfossils as 'proto-cells,' and less so as providing strong support for this hypothesis. 

      Regarding the taphonomic alterations: The editor and reviewers are correct in pointing out this issue. Taphonomic alteration of the microfossils attains special significance in the case of microorganisms, as they lack rigid structures and are prone to morphological alterations during or after their fossilization. We are acutely aware of this issue and have conducted long-term experiments (lasting two years) to observe how cells die, decay, and get preserved. A large section of the manuscript (pages 11 to 20) and a substantial portion of the supplementary information is dedicated to understanding the taphonomic alterations. To the best of our knowledge, these are among the longest experiments done to understand the taphonomic alterations of the cells within laboratory conditions. 

      Recent reports by Orange et al. (1,2)  showed that under favorable environmental conditions, cells could be fossilized rather rapidly with little morphological modifications. We observed a similar phenomenon in this work. Cells in our study underwent rapid encrustation with cations from the growth media. We have analyzed the morphological changes over a period of 18 months. After 18 months, the softer biofilms got encrusted entirely in salt and turned solid (Fig. ). Despite this transformation, morphologically intact cells could still be observed within these structures. This suggests that the cells inhabiting Archaean coastal marine environments could undergo rather rapid encrustation, and their morphological features could be preserved in the geological record with little taphonomic alteration.    

      Regarding the environmental conditions: We are in total agreement with the reviewers that much is unknown about Archaean geology and its environmental conditions. Like the present-day Earth, Archaean Earth certainly had regions that greatly differed in their environmental conditions—volcanic freshwater ponds, brines, mildly halophilic coastal marine environments, and geothermal and hydrothermal vents, to name a few. Our experimental design focuses on one environment we have a relatively good understanding of rather than the rest of the planet, of which we know little. Below, we list our reasons for restricting to coastal marine environments and studying cells under mildly halophilic experimental conditions.  

      (1) Very little continental crust from Haden and early Archaean Eon exists on the presentday Earth. Much of our geochemical understanding of this time period was a result of studying the Pilbara Iron Formations and the Barberton Greenstone Belt. Geological investigations suggest that these sites were coastal marine environments. The salinity of coastal marine environments is higher than that of open oceans due to the greater water evaporation within these environments. Moreover, brines were discovered within pillow basalts within the Barberton greenstone belt, suggesting that the salinity within these sites is higher or similar to marine environments. 

      (2) We are not certain about the environmental conditions that could have supported the origin of life. However, all currently known Archaean microfossils were reported from coastal marine environments (3.8-2.4Ga). This suggests that proto-life likely flourished in mildly halophilic environments, similar to the experimental conditions employed in our study. 

      (3) The chemical analysis of Archaean microfossils also suggests that they lived in saltrich environments, as most, if not all, microfossils are closely associated, often encrusted in a thin layer of salt.  

      However, we concur with the reviewers that our interpretations should be reassessed if Archaean microfossils that greatly differ from the currently known microfossils are to be discovered or if new microfossils are to be reported from environments other than coastal marine sites.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Microfossils from the Paleoarchean Eon represent the oldest evidence of life, but their nature has been strongly debated among scientists. To resolve this, the authors reconstructed the lifecycles of Archaean organisms by transforming a Gram-positive bacterium into a primitive lipid vesicle-like state and simulating early Earth conditions. They successfully replicated all morphologies and life cycles of Archaean microfossils and studied cell degradation processes over several years, finding that encrustation with minerals like salt preserved these cells as fossilized organic carbon. Their findings suggest that microfossils from 3.8 to 2.5 billion years ago were likely liposome-like protocells with energy conservation pathways but without regulated morphology. 

      Strengths: 

      The authors have crafted a compelling narrative about the morphological similarities between microfossils from various sites and proliferating wall-deficient bacterial cells, providing detailed comparisons that have never been demonstrated in this detail before. The extensive number of supporting figures is impressive, highlighting numerous similarities. While conclusively proving that these microfossils are proliferating protocells morphologically akin to those studied here is challenging, we applaud this effort as the first detailed comparison between microfossils and morphologically primitive cells. 

      Weaknesses: 

      Although the species used in this study closely resembles the fossils morphologically, it would be beneficial to provide a clearer explanation for its selection. The literature indicates that many bacteria, if not all, can be rendered cell wall-deficient, making the rationale for choosing this specific species somewhat unclear. While this manuscript includes clear morphological comparisons, we believe the authors do not adequately address the limitations of using modern bacterial species in their study. All contemporary bacteria have undergone extensive evolutionary changes, developing complex and intertwined genetic pathways unlike those of early life forms. Consequently, comparing existing bacteria with fossilized life forms is largely hypothetical, a point that should be more thoroughly emphasized in the discussion. 

      Another weak aspect of the study is the absence of any quantitative data. While we understand that obtaining such data for microfossils may be challenging, it would be helpful to present the frequencies of different proliferative events observed in the bacterium used. Additionally, reflecting on the chemical factors in early life that might cause these distinct proliferation modes would provide valuable context. 

      Regarding our choice of using modern organisms or this particular bacterial species: 

      Based on current scientific knowledge, it is logical to infer that cellular life originated as protocells; nevertheless, there has been no direct geological evidence for the existence of such cells on early Earth. Hence, protocells remain an entirely theoretical concept. Moreover, protocells are considered to have been far more primitive than present-day cells. Surprisingly, this lack of sophistication was the biggest challenge in understanding protocells. Designing experiments in which cells are primitive (but not as primitive as non-living lipid vesicles) and still retain a functional resemblance to a living cell does pose some practical challenges. Laboratory experiments with substitute (proxy) protocells almost always come with some limitations. Although not a perfect proxy, we believe protocells and protoplasts share certain characteristics. Having said that, we would like to reemphasize that protoplasts are not protocells. Our reasons for using protoplasts as model organisms and working with this bacterial species (Exiguobacterium Strain-Molly) are based on several scientific and practical criteria listed below.

      (1) Irrespective of cell physiology and intracellular complexity, we believe that protoplasts and protocells share certain similarities in the biophysical properties of their cytoplasm. We explained our reasoning in the manuscript introduction and in our previous manuscripts (Kanaparthi et al., 2024 & Kanaparthi et al., 2023). In short, to be classified as a cell, even a protocell should possess minimal biosynthetic pathways, a physiological mechanism of harvesting free energy from the surrounding (energy-yielding pathways), and a means of replicating its genetic material and transferring it to the daughter cells. These minimal physiological processes could incorporate considerable cytoplasmic complexity. Hence, the biophysical properties of the protocell cytoplasm could have resembled those of the cytoplasm of protoplasts, irrespective of the genomic complexity. 

      (2) Irrespective of their physiology, protoplasts exhibit several key similarities to protocells, such as their inherent inability to regulate their morphology or reproduction. This similarity was pointed out in previous studies (3). Despite possessing all the necessary genetic information, protoplasts undergo reproduction through simple physiochemical processes independent of canonical molecular biological processes. This method of reproduction is considered to have been erratic and rather primitive, akin to the theoretical propositions on protocells. Although protoplasts are fully evolved cells with considerable physiological complexity, the above-mentioned biophysical similarities suggest that the protoplast life cycle could morphologically resemble that of protocells (in no other aspect except for their morphology and reproduction).  

      (3) Physiologically or genomically different species of Gram-positive protoplasts are shown to exhibit similar morphologies. This suggests that when Gram-positive bacteria lose their cell wall and turn into a protoplast,  they reproduce in a similar manner independent of physiological or genome-based differences. As morphology and only morphology is key to our study, at least from the scope of this study, intracellular complexity is not a key consideration. 

      (4) This specific strain was isolated from submerged freshwater springs in the Dead Sea. This isolate and members of this bacterial genus are known to have been well acclimatized to growing in a wide range of salt concentrations and in different salt species. This is important for our study (this and previous manuscript), in which cells must be grown not only at high salt concentrations (1-15%) but in different salts like NaCl, MgCl<sub>2</sub>, and KCl. 

      (5) Our initial interest in this isolate was due to its ability to reduce iron at high salt concentrations. Given that most spherical microfossils are found in Archaean-banded iron formations covered in pyrite, this suggests that these microfossils could have been reducing oxidized iron species like Fe(III). Nevertheless, over the course of our study, we realized the complexities of live cell staining and imaging under anoxic conditions. Given that the scope of the manuscript is restricted only to comparing the morphologies, not the physiology, we abandoned the idea of growing cells under anoxic conditions.  

      Based on these observations, cell physiology may not be a key consideration, at least within the scope of studying microfossil morphology. However, we want to emphasize again that “We do not claim present-day protoplasts are protocells.”  

      Regarding the absence of quantitative data:

      We are unsure what the reviewer meant by the absence of quantitative data. Is it from the cell size/reproductive pathways perspective or from a microfossil/ecological perspective? At the risk of being portrayed in a bad light, we admit that we did not present quantitative data from either of these perspectives. In our defense, this was not due to our lack of effort but due to the practical limitations imposed by our model organism. 

      If the reviewer means the quantitative data regarding cell sizes and morphology: In our previous work, we studied the relationship between protoplast morphology, growth rate, and environmental conditions. In that study, we proposed that the growth rate is one factor that regulates protoplast morphology. Nevertheless, we did not observe uniformity in the sizes of the cells. This lack of uniformity was not just between the replicates but even among the cells grown within the same culture flask or the cells within the same microscopic field. Moreover, cells are often observed to be reproducing either by forming internal or external or by both these processes at the same time. The size and morphological differences among cells within a growth stage could be explained by the physiological and growth rate heterogenicity among cells. 

      Bacterial growth curves and their partition into different stages (lag, log & stationary), in general, represent the growth dynamics of an entire bacterial population. Nevertheless, averaging the data obscures the behavior of individual cells (4,5). It is known that genetically identical cells within a single bacterial population could exhibit considerable cell-to-cell variation in gene expression (6,7) and growth rates (8). The reason for such stochastic behavior among monoclonal cells has not been well understood. In the case of normal cells, morphological manifestation of these variations is restricted by a rigid cell wall. Given the absence of a cell wall in protoplasts, we assume such cell-to-cell variations in growth rate is manifested in cell morphology. This makes it challenging to quantitatively determine variations in cell sizes or the size increase in a statically robust manner, even in monoclonal cells. 

      Although this lack of uniformity in cell sizes should not be perceived as a limitation, this behavior is consistently observed among microfossils. Spherical microfossils of similar morphology but different sizes were reported from different microfossil sites (9,10). In this regard, both protoplasts and microfossils are very similar. 

      If the reviewer means the quantitative data from an ecological perspective: 

      Based on the elemental composition and the isotopic signatures of the organic carbon, we can deduce if these structures are of biological origin or not. However, any further interpretation of this data to annotate these microfossils to a particular physiology group is fraught with errors. Hence, we refrain from making any inferences about the physiology and ecological function of these microfossils. This lack of clarity on the physiology of microfossils reduces the chance of quantitative studies on their ecological functions. Moreover, we would like to re-emphasize that the scope of this work is restricted to morphological comparison and is not targeted at understanding the ecological function of these microfossils. This narrow objective also limits the nature of the quantitative data we could present.

      Moreover, developing a quantitative understanding of some phenomena could be technically challenging. Many theories on the origin of life, like chemical evolution, started with the qualitative observation that lightning could mediate the synthesis of biologically relevant organic carbon. Our quantitative understanding of this process is still being explored and debated even to this day.     

      Reviewer #2 (Public Review): 

      Summary: 

      In summary, the manuscript describes life-cycle-related morphologies of primitive vesiclelike states (Em-P) produced in the laboratory from the Gram-positive bacterium Exiguobacterium Strain-Molly) under assumed Archean environmental conditions. Em-P morphologies (life cycles) are controlled by the "native environment". In order to mimic Archean environmental conditions, soy broth supplemented with Dead Sea salt was used to cultivate Em-Ps. The manuscript compares Archean microfossils and biofilms from selected photos with those laboratory morphologies. The photos derive from publications on various stratigraphic sections of Paleo- to Neoarchean ages. Based on the similarity of morphologies of microfossils and Em-Ps, the manuscript concludes that all Archean microfossils are in fact not prokaryotes, but merely "sacks of cytoplasm". 

      Strengths: 

      The approach of the authors to recognize the possibility that "real" cells were not around in the Archean time is appealing. The manuscript reflects the very hard work by the authors composing the Em-Ps used for comparison and selecting the appropriate photo material of fossils. 

      Weaknesses: 

      While the basic idea is very interesting, the manuscript includes flaws and falls short in presenting supportive data. The manuscript makes too simplistic assumptions on the "Archean paleoenvironment". First, like in our modern world, the environmental conditions during the Archean time were not globally the same. Second, we do not know much about the Archean paleoenvironment due to the immense lack of rock records. More so, the Archean stratigraphic sections from where the fossil material derived record different paleoenvironments: shelf to tidal flat and lacustrine settings, so differences must have been significant. Finally, the Archean spanned 2.500 billion years and it is unlikely that environmental conditions remained the same. Diurnal or seasonal variations are not considered. Sediment types are not considered. Due to these reasons, the laboratory model of an Archean paleoenvironment and the life therein is too simplistic. Another aspect is that eucaryote cells are described from Archean rocks, so it seems unlikely that prokaryotes were not around at the same time. Considering other fossil evidence preserved in Archean rocks except for microfossils, the many early Archean microbialites that show baffling and trapping cannot be explained without the presence of "real cells". With respect to lithology: chert is a rock predominantly composed of silica, not salt. The formation of Em-Ps in the "salty" laboratory set-up seems therefore not a good fit to evaluate chert fossils. Formation of structures in sediment is one step. The second step is their preservation. However, the second aspect of taphonomy is largely excluded in the manuscript, and the role of fossilization (lithification) of Em-Ps is not discussed. This is important because Archean rock successions are known for their tectonic and hydrothermal overprint, as well as recrystallization over time. Some of the comparisons of laboratory morphologies with fossil microfossils and biofilms are incorrect because scales differ by magnitudes. In general, one has to recognize that prokaryote cell morphologies do not offer many variations. It is possible to arrive at the morphologies described in various ways including abiotic ones. 

      Regarding the simplistic presumptions on the Archaean Eon environmental conditions, we provided a detailed explanation of this issue in our response to the eLife evaluation. In short, we agree with the reviewer that little is known about the Archaean Eon environmental conditions at a planetary scale. Hence, we restricted our study to one particular environment of which we had a comparatively good understanding. The Archaean Eon spanned 2.5 billion years. However, most of the microfossil sites we discussed in the manuscript are older than 3 billion years, with one exception (2.4 billion years old Turee Creek microfossils). We presume that conditions within this niche (coastal marine) environment could not have changed greatly until 2Ga, after which there have been major changes in the ocean salt composition and salinities.

      In the manuscript, we discussed extensively the reasons for restricting our study to these particular environmental conditions. Further explanations of these choices are presented in our response to the eLife evaluation (also see our previous manuscript). In short, the fact that all known microfossils are restricted to coastal marine environments justifies the experimental conditions employed in our study. Nevertheless, we agree with the reviewer that all lab-based studies involve some extent of simplification. This gap/mismatch is even wider when it comes to studies involving origin or early life on Earth.

      We are not arguing that prokaryotes are not around at this time. The key message of the manuscript is that they are present, but they have not developed intracellular mechanisms to regulate their morphology and remained primitive in this aspect.  

      The sizes of the microfossils and cells from our study were similar in most cases. However, we agree with the reviewer that they deviated considerably in some cases, for example, S70, S73, and S83. These size variations are limited to sedimentary structures like laminations rather than cells. These differences should be expected as we try to replicate the real-life morphologies of biofilms that could have extended over large swats of natural environments in a 2ml volume chamber slide. More specifically, in Fig. S70, there is a considerable size mismatch. But, in Fig. S73, the sizes were comparable between A & C (of course, the size of our reproduction did not match B). In the case of Fig. S83, we do not see a huge size mismatch.      

      Reviewer #1 (Recommendations For The Authors): 

      We would like to provide several suggestions for changes in text and additions to data analysis. 

      39-41: It has been stated that reconstructing the lifecycle is the only way of understanding the nature of these microfossils. First of all, I would rephrase this to 'the most promising way', as there are always multiple approaches to comparing phenomena. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 41). 

      125: Please rephrase "under the environmental condition of early Earth" to "under experimental conditions possibly resembling the conditions of the Paleoarchean Eon". Now it sounds like the exact environmental conditions have been produced, which has already been debated in the discussion. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 127). 

      125: Please mention the fold change in size, the original size in numbers, and whether this change is statistically significant. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      128: Have you found a difference in the relative percentages of modes of reproduction? In other words, is there a difference in percentage between forming internal daughter cells or a string of external daughter cells? 

      We explained our reservations about presenting the exact number above. But this has been extensively discussed in our accompaining manuscript. We want to reemphasize that the scope of this manuscript is restricted to comparing morphologies rather than providing a mechanistic explanation of the reproduction process. 

      151: A similar model for endocytosis has already been described in proliferating wall-less cells (Kapteijn et al., 2023). In the discussion, please compare your results with the observations made in that paper. 

      This is an oversight on our part. The manuscript suggested by the reviewer has now been added (line 154 & 155).  

      163: Please use another word for uncanny. We suggest using 'strong resemblance'. 

      We changed this according to the reviewers' suggestion (line 168). 

      433: Please elaborate on why the results are not shown. This sounds like a statement that should be substantiated further. 

      To observe growth and simultaneously image the cells, we conducted these experiments in chamber slides (2ml volume). Over time, we observed cells growing and breaking out of the salt crust (Fig. S86, S87 & Movie 22) and a gradual increase in the turbidity of the media. Although not quantitative, this is a qualitative indication of growth. We did not take precise measurements for several reasons. This sample is precious; it took us almost two years to solidify the biofilm completely, as shown in Fig. S84A. Hence, it was in limited supply, which prevented us from inoculating these salt crusts into large volumes of fresh media. Given a long period of starvation, these cells often exhibited a long lag phase (several days), and there wasn't enough volume to do OD measurements over time. 

      We also crushed the solidified biofilm with a sterile spatula before transferring it into the chamber slide with growth media. This resulted in debris in the form of small solid particles, which interfered with our OD measurements. These practical considerations made it challenging to determine the growth precisely. Despite these challenges, we measured an OD of 4 in some chamber slides after two weeks of incubation. Given that these measurements were done haphazardly, we chose not to present this data. 

      456: Could you please double-check whether the description is correct for the figure? 8C and 8D are part of Figure 8B, but this is stated otherwise in the description. 

      We thank the reviewer for pointing it out. It has now been rectified (line 461-472).

      Reviewer #2 (Recommendations For The Authors): 

      We thank Reviewer #2  for carefully reading the manuscript and such an elaborate list of questions. The revisions suggested have definitely improved the quality of the manuscript. Here, we would like to address some of the questions that came up repeatedly below. One frequently asked question is regarding the letters denoting the individual figures within the images. For comparison purposes, we often reproduced previously published images. To maintain a consistent figure style, we often have to block the previous denotations with an opaque square and give a new letter. 

      The second question that appeared repeatedly below is the missing scale bars in some of the images within a figure. We often did not include a scale bar in the images when this image is an enlarged section of another image within the same figure.     

      Title: Please consider being more precise in the title. Microfossils are only one fossil group of "oldest life". Perhaps better: "On the nature of some microfossils in Archean rocks". (see also Line 37).  

      Authors’ response: The title conveys a broader message without quantitative insinuations. If our manuscript had been titled "On the nature of all known Archaean microfossils,” we should have agreed with the reviewer's suggestion and changed it to "On the nature of some microfossils in Archean rocks". As it is not, we respectfully decline to make this modification.     

      Abstract:  

      Line 41: "one way", not "the only way" 

      We agree with the reviewer’s comment, and necessary changes have been made (line 41).  

      Introduction: 

      Line 58f: "oldest sedimentary rock successions", not "oldest known rock formations". There are rocks of much older ages, but those are not well preserved due to metamorphic overprint, or the rocks are igneous to begin with. Minor issue: please note that "formations" are used as stratigraphic units, not so much to describe a rock succession in the field. 

      We agree with the reviewer’s comment and have made necessary changes (line 58).

      Line 67: Microfossils are widely accepted as evidence of life. Please rephrase. 

      We agree with the reviewer’s comment, and necessary changes have been made.

      Line 71 - 74: perhaps add a sentence of information here.

      We agree with the reviewer’s comment, and necessary changes have been made (line 71).

      Line 76: which "chemical and mineralogical considerations"? 

      This has been rephrased to “Apart from the chemical and δ<sup>13</sup>C-biomass composition” (line 76).

      Line 84ff: This is a somewhat sweeping statement. Please remember that there are microbialites in such rocks that require already a high level of biofilm organization. The existence of cyanobacteria-type microbes in the Archean is also increasingly considered. 

      We are aware of literature that labeled the clusters of Archaean microfossils as biofilms and layered structures as microbialites or stromatolite-like structures. However, the use of these terms is increasingly being discouraged. A more recent consensus among researchers suggests annotating these structures simply as sedimentary structures, as microbially induced sedimentary structures (MISS). 

      We respectfully disagree with the reviewer’s comment that Archaean microfossils exhibit a high level of biofilm organization. We are not aware of any studies that have conducted such comprehensive research on the architecture of Archaean biofilms. We are not even certain if these clusters of Archaean cells could even be labeled as biofilms in the true sense of the term. We presently lack an exact definition of a biofilm. In our study, we do see sedimentation and bacteria and their encapsulation in cell debris. From a broader perspective, any such aggregation of cells enclosed in cell debris could be annotated as a biofilm. However, more in-depth studies show that biofilm is not a random but a highly organized structure. Different bacterial species have different biofilm architectures and chemical composition. The multispecies biofilms in natural environments are even more complex. We do agree with the reviewer that these structures could broadly be labeled as biofilms, but we presently lack a good, if any, understanding of the Archaean biofilm architecture. 

      Regarding the annotation of microfossils as cyanobacteria, we respectfully disagree with the reviewer. This is not a new concept. Many of the Archaean microfossils were annotated as cyanobacteria at the time of their discovery. This annotation is not without controversy. With the advent of genome-based studies, researchers are increasingly moving away from this school of thought.  

      Line 101ff: The conditions on early Earth are unknown - there are many varying opinions. Perhaps simply state that this laboratory model simulates an Archean Earth environment of these conditions outlined. 

      This is a good idea. We thank the reviewer for this suggestion, and we made appropriate changes. 

      Line 112: manuscript to be replaced by "paper"? 

      This change has been made (line 114).

      Line 116: "spanned years" - how many years? 

      We now added the number of years in the brackets (line 118).

      Results: 

      Line 125: see comment for 101ff. 

      we made appropriate changes. 

      Figure 1: Caption: Please write out ICV the first time this abbreviation is used. Images: Note that some lettering appears to not fit their white labels underneath. (G, H, I, J0, and M). 

      We apologize; this is an oversight on our part. We now spell complete expansion of ICV, the first time we used this abbreviation. 

      We took these images from previously published work (references in the figure legend), so we must block out the previous figure captions. This is necessary to maintain a uniform style throughout the manuscript. 

      Line 152ff.: here would be a great opportunity to show in a graph the size variations of modern ICVs and to compare the variations with those in the fossil material. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      Line 159f.: Fig.1K - what is to see here? Maybe a close-up or - better - a small sketch would help? 

      Fig. 1K shows the surface depressions formed during the vesicle formation. The surface characteristics of EM-P and microfossils is very similar.   

      Line 161f.: reference?  

      The paragraph spanning lines 159 to 172 discusses the morphological similarities between EM-P and SPF microfossils. We rechecked the reference no 35 (Delarue 2019). This is the correct reference. We do not see a mistake if the reviewer meant the reference to the figures.    

      Line 164ff.: A question may be asked, how many fossils of the Strelley Pool population would look similar to the "modeled" ones. Questions may rise in which way the environmental conditions control such morphology variations. Perhaps more details? 

      This relationship between the environmental conditions and the morphology is discussed extensively in our previous work (11).  

      Line 193: what is meant by "similar discontinuous distribution of organic carbon"?

      This statement highlights similarities between EM-P and microfossils. The distribution of cytoplasm within the cells is not uniform. There are regions with and devoid of cytoplasm, which is quite unusual for bacteria. Some previous studies argued that this could indicate that these organic structures are of abiotic origin. Here, we show that EMP-like cells could exhibit such a patchy distribution of cytoplasm within the cell.    

      Line 218 - 291: The observations are very nice, however, the figures of fossil material in Figures 3 A, B, and C appear not to conform. Perhaps use D, E and I to K. Also, S48 does not show features as described here (see below).  

      We did not completely understand the reviewer’s question. As mentioned in the figure legend, both the microfossils and the cells exhibit string with spherical daughter cells within them. Moreover, there are also other similarities like the presence of hollow spherical structures devoid of organic carbon. We also saw several mistakes in the Fig. S48 legend. We have rectified them, and we thank the reviewer for pointing them out.   

      Line 293f: Title with "." at end?

      This change has been made.

      Line 298: predominantly in chert. In clastic material preservation of cells and pores is unlikely due to the common lack of in situ entombment by silica. 

      We rephrased this entire paragraph to better convey our message. Either way, we are not arguing that hollow pore spaces exist. As the reviewer mentioned, they will, of course, be filled up with silica. In this entire paragraph, we did not refer to hollow spaces. So, we are not entirely sure what the question was.     

      Line 324, 328-349: Please see below comments on the supplementary figures 51-62. Some of the interpretations of morphologies may be incorrect. 

      Please find our response to the reviewer’s comments on individual figures below.  

      Figure 5 A to D look interesting, however E to J appear to be unconvincing. What is the grey frame in D (not the white insert). 

      The grey color is just the background that was added during the 3D rendering process.  

      Figure 6 does not appear to be convincing. - Erase? 

      We did not understand the reviewer’s reservations regarding this figure. Images A-F within the figure show the gradual transformation of cells into honeycomb-like structures, and images G-J show such structures from the Archaean that are closely associated with microfossils. Moreover, we did not come up with this terminology (honeycomb-like). Previous manuscripts proposed it.  

      Line 379ff: S66 and 69, please see my comments below. Microfossils "were often discovered" in layers of organic carbon. 

      Please see our response below.   

      Line 393-403: Laminae? There are many ways to arrive at C-rich laminae, especially, if the material was compressed during burial. Basically, any type of biofilm would appear as laminae, if compressed. The appearance of thin layers is a mere coincidence. Note that the scale difference in S70, S73, as well as S83, is way too high (cm versus μm!) to allow any such sweeping conclusions. What are α- and β- laminations, the one described by Tice et al.? The arguments are not convincing.

      We propose that cells be compressed to form laminae. We answered this question above about the differences in the scale bars. Yes, we are referring to α- and β- laminations described by Tice et al.       

      Figure 7: This is an interesting figure, but what are the arguments for B and C, the fossil material, being a membrane? Debris cannot be distinguished with certainty at this scale in the insert of C. B could also be a shriveled-up set of trichomes.  

      We agree with the reviewer that debris cannot be definitely differentiated. Traditionally, annotations given to microfossil structures such as biofilm, intact cells, or laminations were all based on morphological similarities with existing structures observed in microorganisms. Given that the structures observed in our study are very similar to the microfossil structures, it is logical to make such inferences. Scales in A & B match perfectly well. The structure in C is much larger, but, as we mentioned in reply to one of the reviewer’s earlier questions, some of the structures from natural environments could not be reproduced at scale in lab experiments. Working in a 2 ml chamber slides does impose some restrictions.   

      Figure 8: The figure does not show any honeycomb patterns. The "gaps" in the Moodies laminae are known as lenticular particles in biofilms. They form by desiccated and shriveledup biofilm that mineralizes in situ. Sometimes also entrapped gases induce precipitation. Note also that the modelled material shows a kind of skin around the blobs that are not present in the Moodies material.  

      We agree that entrapped gas bubbles could have formed lenticular gaps. In the manuscript, we did not discount this possibility. However, if that is the case, one should explain why we often find clumps of organic carbon within these gaps. As we presented a step-by-step transformation of parallel layers of cells into laminations, which also had similar lenticular gaps, we believe this is a more plausible way such structures could have formed. In the end, there could have been more than one way such structures could have been formed. 

      We do see the honeycomb pattern in the hollow gaps. Often, the 3D-rendering of the STED images obscures some details. Hence, in the figure legend, we referred to the supplementary figures also show the sequence of steps involved in the formation of such a pattern.      

      Line 405-417: During deposition of clastic sediment any hollow space would be compressed during burial and settling. It is rare that additional pore space (except between the graingrain-contacts) remains visible, especially after consolidation. The exception would be if very early silicification took place filling in any pore space. What about EPS being replaced by mineralic substance? The arguments are not convincing. 

      We are suggesting that EPS or cell debris is rapidly encrusted by cations from the surrounding environment and gets solidified into rigid structures. This makes it possible for the structures to be preserved in the fossil record. We believe that hollow structures like the lenticular gaps will be filled up with silica. 

      We do not agree with the reviewer’s comment that all biological structures will be compressed. If this is true, there should be no intact microfossils in the Archaean sedimentary structures, which is definitely not the case.      

      Line 419-430: Lithification takes place within the sediment and therefore is commonly controlled by the chemistry of pore water and chemical compounds that derive from the dissolution of minerals close by. Another aspect to consider is whether "desiccation cracks" on that small scale may be artefacts related to sample preparation (?).  

      We agree that desiccation cracks could have formed during the sample preparation for SEM imaging, as this involves drying the biofilms. However, we observed that the sample we used for SEM is a completely solidified biofilm (Fig. S84), so we expect little change in its morphology during drying. Moreover, visible cracks and pointy edges were also observed in wet samples, as shown in Fig. S87.        

      Line 432 - 439: Please see comments on the supplementary material below.

      Please find our response to the reviewer’s comments on individual figures below.  

      Discussion:  

      Line 477f: "all known microfossil morphologies" - is this a correct statement? Also, would the Archean world provide only one kind of "EM-P type"? Morphologies of prokaryote cells (spherical, rod-shaped, filamentous) in general are very simple, and any researcher of Precambrian material will appreciate the difficulties in concluding on taxonomy. There are papers that investigate putative microfossils in chert as features related to life cycles. Microfossil-papers commonly appear not to be controversial give and take some specific cases.  

      We made a mistake in using the term “all known microfossil morphologies.” We have now changed it to “all known spherical microfossils” from this statement (line 483). However, we do not agree with the statement that microfossil manuscripts tend not to be controversial. Assigning taxonomy to microfossils is anything but controversial. This has been intensely debated among the scientific community.     

      Line 494-496: This statement should be in the Introduction.

      We agree with the reviewer’s comment. In an earlier version of the manuscript this statement was in the introduction. To put this statement in its proper context, it needs to be associated with a discussion about the importance of morphology in the identification of microfossils. The present version of the manuscript do not permit moving an entire paragraph into the introduction. Hence, we think making this statement in the discussion section is appropriate. 

      Line 484ff. The discussion on biogenicity of microfossils is long-standing (e.g., biogenicity criteria by Buick 1990 and other papers), and nothing new. In paleontology, modern prokaryotes may serve as models but everyone working on Archean microfossils will agree that these cannot correspond to modern groups. An example is fossil "cyanobacteria" that is thought to have been around already in the early Archean. While morphologically very similar to modern cyanobacteria, their genetic information certainly differed - how much will perhaps remain undisclosed by material of that high age.  

      Yes, we agree with the reviewer that there has been a longstanding conflict on the topic of biogenicity of microfossils. However, we have never come across manuscripts suggesting that modern microorganisms should only be used as models. If at all, there have been numerous manuscripts suggesting that these microfossils represent cyanobacteria, streptomycetes, and methanotrophs. Regarding the annotation of microfossils as cyanobacteria, we addressed this issue in one of the previous questions raised by the reviewer.    

      Line 498ff: Can the variation of morphology and sizes of the EM-Ps be demonstrated statistically? Line 505ff are very speculative statements. Relabeling of what could be vesicles as "microfossils" appears inappropriate. Contrary to what is stated in the manuscript, the morphologies of the Dresser Formation vesicles do not resemble the S3 to S14 spheroids from the Strelley Pool, the Waterfall, and Mt Goldsworthy sites listed in the manuscript. The spindle-shaped vesicles in Wacey et al are not addressed by this manuscript. What roles in mineral and element composition would have played diagenetic alteration and the extreme hydrothermal overprint and weathering typical for Dresser material? S59, S60 do not show what is stated, and the material derives from the Barberton Greenstone Belt, not the Pilbara.

      Please see the comments below regarding the supplementary images. 

      We did not observe huge variations in the cell morphology. Morphologies, in most cases, were restricted to spherical cells with intracellular vesicles or filamentous extensions. Regarding the sizes of the cells, we see some variations. However, we are reluctant to provide exact numbers. We have presented our reasons above.

      We respectfully disagree with the reviewer’s comments. We see quite some similarities between Dresser formation microfossils and our cells. Not just the similarities, we have provided step-by-step transformation of cells that resulted in these morphologies. We fail to see what exactly is the speculation here. The argument that they should be classified as abiotic structures is based on the opinion that cells do form such structures. We clearly show here that they can, and these biological structures resemble Dresser formation microfossils more closely than the abiotic structures. 

      Regarding the figures S3-S14. We think they are morphologically very similar. Often, it's not just comparing both images or making exact reproductions (which is not possible). We should focus on reproducing the distinctive morphological features of these microfossils.  

      We agree with the reviewer that we did not reproduce all the structures reported by Wacey’s original manuscript, such as spherical structures. We are currently preparing another manuscript to address the filamentous microfossils. These spindle-like structures will be addressed in this subsequent work. 

      We agree with the reviewer, we often have difficulties differentiating between cells and vesicles. This is not a problem in the early stages of growth. During the log phase, a significant volume of the cell consists of the cytoplasm, with hollow vesicles constituting only a minor volume (Fig. 1B or S1A). During the later growth stages (Fig. 1E7F or S11), cells were almost hollow, with numerous daughter cells within them. These cells often resemble hollow vesicles rather than cells. However, given these are biologically formed structures, and one could argue that these vesicles are still alive as there is still a minimal amount of cytoplasm (Fig. S27). Hence, we should consider them as cells until they break apart to release daughter cells. 

      Regarding Figures S59 and S60, we did not claim either of these microfossils is from Pilbara Iron Formations. The legend of Figure S59 clearly states that these structures are from Buck Reef Chert, originally reported by Tice et al., 2006 (Figure 16 in the original manuscript). The legend of Figure S60 says these structures were originally reported by Barlow et al., 2018, from the Turee Creek Formation. 

      Line 546f and 552: The sites including microfossils in the Archean represent different paleoenvironments ranging from marine to terrestrial to lacustrine. References 6 and 66 are well-developed studies focusing on specific stratigraphic successions, but cannot include information covering other Archean worlds of the over 2.5 Ga years Archean time.  

      All the Archaean microfossils reported to date are from volcanic coastal marine environments. We are aware that there are rocky terrestrial environments, but no microfossils have been reported from these sites. We are unaware of any Archaean microfossils reported from freshwater environments. 

      Line 570ff: The statements may represent a hypothesis, but the data presented are too preliminary to substantiate the assumptions.

      We believe this is a correct inference from an evolutionary, genomic, and now from a morphological perspective. 

      Figures:  

      Please check all text and supplementary figures, whether scale bars are of different styles within the figure (minor quibble). 

      S3 (no scale in C, D); S4, S5: Note that scale bars are of different styles. 

      We believe we addressed this issue above. 

      S6 D: depressions here are well visible - perhaps exchange with a photo in the main text? Note that scale bars are of different styles.  

      We agree that depressions are well visible in E. The same image of EM-P cell in E is also present in Fig. 1D in the main text.   

      S7: Scale bars should all be of the same style, if anyhow possible. Scale in D? 

      We believe we addressed this issue above. 

      S9: F appears to be distorted. Is the fossil like this? The figure would need additional indicators (arrows) pointing toward what the reader needs to see - not clear in this version. More explanation in the figure caption could be offered. 

      We rechecked the figure from the original publication to check if by mistake the figure was distorted during the assembly of this image. We can assure you that this is not the case. We are not sure what further could be said in the figure legend.     

      S13: What is shown in the inserts of D and E that is also visible in A and B? Here a sketch of the steps would help. 

      We did not understand the question.  

      S14: Scale in A, B? 

      We believe we addressed this issue above. 

      S15: Scales in A, E, C, D 

      We believe we addressed this issue above. 

      S16: scales in D, E, G, H, I, J?  

      We believe we addressed this issue above. 

      S17: "I" appears squeezed, is that so? If morphology is an important message, perhaps reduce the entire figure so it fits the layout. Note that labels A, B, C, and D are displaced. 

      As shown in several subsequent figures, the hollow spherical vesicles are compressed first into honeycomb-like structures, and they often undergo further compression to form lamination-like structures. Such images often give the impression that the entire figure is squashed, but this is not the case. If one examines the figure closely, you could see perfectly spherical vesicles together with laterally sqeezed structures. Regarding the figure labels, we addressed this issue above. 

      S18: The filamentous feature in C could also be the grain boundaries of the crystals. Can this be excluded as an interpretation? Are there microfossils with the cell membranes? That would be an excellent contribution to this figure. Note that scale bars are of different styles.

      If this is a one-off observation, we could have arrived at the reviewer's opinion. But spherical cells in a “string of beads” configuration were frequently reported from several sites, to be discounted as mere interpretation.    

      S19: The morphologies in A - insert appear to be similar to E - insert in the lower left corner. The chain of cells in A may look similar to the morphologies in E - insert upper right of the image. B - what is to see here? D - the inclusions do not appear spherical (?). Does C look similar to the cluster with the arrow in the lower part of image E? Note that scale bars are of different styles (minor quibble). A, B, C, and D appear compressed. Perhaps reduce the size of the overall image?  

      The structures highlighted (yellow box) in C are similar to the highlighted regions in E—the agglomeration of hollow vesicles. It is hard to get understand this similarity in one figure. The similarities are apparent when one sees the Movie 4 and Fig. S12, clearly showing the spherical daughter cells within the hollow vesicle. We now added the movie reference to the figure legend.    

      S20: A appears not to contribute much. The lineations in B appear to be diagenetic. However, C is suitable. Perhaps use only C, D, E? 

      We believe too many unrecognizable structures are being labeled as diagenetic. Nevertheless, we do not subscribe to the notion that these are too lenient interpretations. These interpretations are justified as such structures have not been reported from live cells. This is the first study to report that cells could form such structures. As we now reproduced these structures, an alternate interpretation that these are organic structures derived from microfossils should be entertained. 

      S 21: Note that scale bars are of different styles.  

      We believe we addressed this issue above. 

      S22: Perhaps add an arrow in F, where the cell opened, and add "see arrow" in the caption? Is this the same situation as shown in C (white arrow)? What is shown by the white arrow in A? Note that scale bars are of different styles.

      We did the necessary changes.  

      S23: In the caption and main text, please replace "&" with "and" (please check also the other figure captions, e.g. S24). Note that scale bars are of different styles. What is shown in F? A, D - what is shown here?

      We replaced “&” with “and.”  

      S24: Note that scale bars are of different styles. Note that Wacey et al. describe the vesicles as abiotic not as "microfossils"; please correct in figure caption [same also S26; 25; 28].

      We are aware of Prof. Dr. Wacey’s interpretations. We discuss it at length in the discussion section our manuscript. Based on the similarities between the Dresser formation structures and structures formed by EM-P, we contest that these are abiotic structures.  

      S25: Appears compressed; note different scale bars. 

      We believe we addressed this issue above. 

      S28: The label in B is still in the upper right corner; scale in D? What is to see in rectangles (blue and red) in A, B? In fossil material, this could be anything. 

      These figures are taken from a previous manuscript cited in the figure legend. We could not erase or modify these figures.  

      S33: "L"ewis; G appears a bit too diffuse - erase? Note that scale bars are of different styles.

      We believe we addressed this issue above. 

      S34: This figure appears unconvincing. Erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S35: It would be more convincing to show only the morphological similarities between the cell clusters. B and C are too blurry to distinguish much. Scales in D to F and in sketches? A appears compressed (?). 

      We rechecked the original manuscript to see if image A was distorted while making this figure, but this is not the case. Regarding B & C, cells in this image are faint as they are hollow vesicles and, by nature, do not generate too much contrast when imaged with a phase-contrast microscope. There are some limitations on how much we can improve the contrast. We now added scale bars for D-I. Similarly, faint hollow vesicles can be seen in Fig. S21 C & D, and Fig. 3H.  

      S36: Very nice; in B no purple arrow is visible. Note that scale bars are of different styles. S37 and S36 are very much the same - fuse, perhaps?  

      We are sorry for the confusion. There are purple arrows in Fig. S37B-D. 

      S38: this is a more unconvincing figure - erase? 

      Unconvincing in wahy sense. There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.

      S39: white rectangle in A? Arrow in A? Note that scale bars are of different styles.

      These are some of the unavoidable remnants from the image from the original publication. 

      S40: in F: CM, V = ?; Note that scale bars are of different style. 

      It’s an oversite on our part. We now added the definitions to the figure legaend. We thank the reviewer for pointing it out.  

      S41: Rectangles in D, E, F, G can be deleted? Scales and labels missing in photos lower right. 

      Those rectangles are added by the image processing software to the 3Drendered images. Regarding the missing scale bars in H & I they are the magnified regions of F. The scale bar is already present in F.   

      S42: appears compressed. G could be trimmed. Labels too small; scale in G? 

      This is a curled-up folded membrane. We needed to lower the resolution of some images to restrict the size of the supplement to journal size restrictions. It is not possible to present 85 figures in high resolution. But we assure you that the image is not laterally compressed in any manner.   

      S43: This figure appears to be unconvincing. Reducing to pairing B, C, D with L, K? Spherical inclusions in B? Scales in E to G? Similar in S44: A, B, E only? Note that scale bars are of different styles. 

      Figures I to K are important. They show not just the morphological similarities but also the sequence of steps through which such structures are formed. We addressed the issue of the scale bars above.  

      S45: A, B, and C appear to show live or subrecent material. How was this isolated of a rock? Note that scale bars are of different styles.  

      It is common to treat rocks with acids to dissolve them and then retrieve organic structures within them. This technique is becoming increasingly common. The procedure is quite extensively discussed in the original manuscript. We don’t see much differences in the scale bars of microfossils and EM-P cells, they are quite similar. 

      S46: A: what is to see here? Note that scale bars are of different styles. 

      There are considerable similarities between the folded fabric like organic structures with spherical inclusions and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S47: Perhaps enlarge B and erase A. Note that scale bars are of different styles. 

      S48: Image B appears to show the fossil material - is the figure caption inconsistent? There are no aggregations visible in the boxes in A. H is described in the figure caption but missing in the figure. Overall, F and G do not appear to mirror anything in A to E (which may be fossil material?). 

      S51; S52 B, C, E; S53: these figures appear unconvincing - erase? 

      Unconvincing in what sense? The structures from our study are very similar to the microfossils.   

      S54: North "Pole; scale bars in A to C =? 

      These figures were borrowed from an earlier publication referenced in the figure legend. That is the reason for the differences in the styles of scale bars.  

      S55: D and E appear not to contribute anything. Perhaps add arrow(s) and more explanation? Check the spelling in the caption, please. 

      D & E show morphological similarities between cells from our study and microfossils (A).   

      S56: Hexagonal morphologies may also be a consequence of diagenesis. Overall, perhaps erase this figure?  

      I certainly agree that could be one of the reasons for the hexagonal morphologies. Such geometric polygonal morphologies have not been observed in living organisms. Nevertheless, as you can see from the figure, such morphologies could also be formed by living organisms. Hence, this alternate interpretation should not be discounted.   

      S57: The figure caption needs improvement. Please add more description. What show arrows in A, what are the numbers in A? What is the relation between the image attached to the right side of A? Is this a close-up? Note that scale bars are of different styles. 

      We expanded a bit on our original description of the figure. However, we request the reviewer to keep in mind that the parts of the figure are taken from previous publication. We are not at liberty to modifiy them, like removing the arrows. This imposes some constrains. 

      S58: There are no honeycomb-shaped features visible. What is to see here? Erase this figure? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P.  

      S59 and S60: What is to see here? - Erase? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P in Fig. S59. Further disintegration of these honeycomb shaped mats into filamentous struructures with spherical cells attached to them can be seen in both Archaean organic structures and structures formed by EM-P.   

      S61: This figure appears to be unconvincing. B and F may be a good pairing. Note that scale bars are of different styles.  

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S62: This figure appears to be unconvincing - erase?

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S66: This figure is unconvincing - erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.    

      S68: Scale in B, D, and E? 

      Image B is just a magnified image of a small portion of image A. Hence, there is no need for an additional scale bar. The same is true for images D and E. 

      S69: This figure appears to be unconvincing, at least the fossil part. Filamentous features are visible in fossil material as well, but nothing else. 

      We are not sure what filamentous features the reviewer is referring to. Both the figures show morphologically similar spherical cells covered in membrane debris.    

      S70 [as well as S82]: Good thinking here, but scales differ by magnitudes (cm to μm). Erase this figure? Very similar to Figure S73: Insert in C has which scale in comparison to B? Note that scale bars are of different styles.  

      We realize the scale bars are of different sizes. In our defense, our experiments are conducted in 1ml volume chamber slides. We don’t have the luxury of doing these experiments on a scale similar to the natural environments. The size differences are to be expected. 

      S71: Scale in E? 

      Image E is just a magnified image of a small portion of image D. Hence, we believe a scale bar is unnecessary. 

      S72: Scale in insert?  

      The insert is just a magnified region of A & C

      S75: This figure appears to be unconvincing. This is clastic sediment, not chert. Lenticular gaps would collapse during burial by subsequent sediment. - Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P.

      S76: A, C, D do not look similar to B - erase? Similar to S79, also with respect to the differences in scale. Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P. We believe we addressed the issue of scale bars above. 

      S80: A appears to be diagenetic, not primary. Erase? 

      These two structures share too many resemblances to ignore or discount just as diagenic structures - Raised filamentous structures originate out of parallel layers of organic carbon (laminations), with spherical cells within this filamentous organic carbon.  

      S85: What role would diagenesis play here? This figure appears unconvincing. Erase?

      We do believe that diagenesis plays a major role in microfossil preservation. However, we also do not suscribe to the notion that we should by default assign diagenesis to all microfossil features. Our study shows that there could be an alternate explanation to some of the observations.  

      S86 and S87: These appear unconvincing. What is to see here? Erase? 

      The morphological similarities between these two structures. Stellarshaped organic structures with strings of spherical daughter cells growing out of them.  

      S88: Does this image suggest the preservation of "salt" in organic material once preserved in chert?  

      That is one inference we conclude from this observation. Crystaline NaCl was previously reported from within the microfossil cells.    

      S89: What is to see here? Spherical phenomena in different materials? 

      At present, the presence of honeycomb-like structures is often considered to have been an indication of volcanic pumice. We meant to show that biofilms of living organisms could result in honeycomb-shaped patterns similar to volcanic pumice.

      References 

      Please check the spelling in the references. 

      We found a few references that required corrention. We now rectified them. 

      References  

      (1) Orange F, Westall F, Disnar JR, Prieur D, Bienvenu N, Le Romancer M, et al. Experimental silicification of the extremophilic archaea pyrococcus abyssi and methanocaldococcus jannaschii: Applications in the search for evidence of life in early earth and extraterrestrial rocks. Geobiology. 2009;7(4). 

      (2) Orange F, Disnar JR, Westall F, Prieur D, Baillif P. Metal cation binding by the hyperthermophilic microorganism, Archaea Methanocaldococcus Jannaschii, and its effects on silicification. Palaeontology. 2011;54(5). 

      (3) Errington J. L-form bacteria, cell walls and the origins of life. Open Biol. 2013;3(1):120143. 

      (4) Cooper S. Distinguishing between linear and exponential cell growth during the division cycle: Single-cell studies, cell-culture studies, and the object of cell-cycle research. Theor Biol Med Model. 2006; 

      (5) Mitchison JM. Single cell studies of the cell cycle and some models. Theor Biol Med Model. 2005; 

      (6) Kærn M, Elston TC, Blake WJ, Collins JJ. Stochasticity in gene expression: From theories to phenotypes. Nat Rev Genet. 2005; 

      (7) Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002; 

      (8) Strovas TJ, Sauter LM, Guo X, Lidstrom ME. Cell-to-cell heterogeneity in growth rate and gene expression in Methylobacterium extorquens AM1. J Bacteriol. 2007; 

      (9) Knoll AH, Barghoorn ES. Archean microfossils showing cell division from the Swaziland System of South Africa. Science. 1977;198(4315):396–8. 

      (10) Sugitani K, Grey K, Allwood A, Nagaoka T, Mimura K, Minami M, et al. Diverse microstructures from Archaean chert from the Mount Goldsworthy–Mount Grant area, Pilbara Craton, Western Australia: microfossils, dubiofossils, or pseudofossils? Precambrian Res. 2007;158(3–4):228–62. 

      (11) Kanaparthi D, Lampe M, Krohn JH, Zhu B, Hildebrand F, Boesen T, et al. The reproduction process of Gram-positive protocells. Sci Rep. 2024 Mar 25;14(1):7075.

    1. Author response:

      We genuinely appreciate the reviewer critiques of our submitted paper, “Otoacoustic emissions but not behavioral measurements predict cochlear-nerve frequency tuning in an avian vocal-communication specialist.” We are planning a number of changes based on the reviewers’ helpful comments that we feel will substantially improve the manuscript and clarify its implications.

      We will add more support for the claim that budgerigars show unusual patterns of behavioral frequency tuning compared to other species. The original manuscript relied on previously published studies of budgerigar critical bands and psychophysical tuning curve to make this point (e.g., Fig. 1). Critical bands and psychophysical tuning curves have unfortunately not been studied in many bird species. Consequently, it was somewhat unclear (based on the information originally presented) whether the “unusual” behavioral tuning results shown in Fig. 1 reflect a hearing specialization in budgerigars or perhaps simply a general avian pattern attributable to declining audibility above 3-4 kHz (a point raised by both reviewers). Fortunately, behavioral critical-ratio results are available from a broader range of species. Albeit a less direct correlate of tuning, the results clearly highlight the unique hearing abilities of budgerigars in relation to other bird species as elaborated upon below.

      The critical ratio is the threshold signal-to-noise ratio for tone detection in wideband noise and partly depends on peripheral tuning bandwidth. Critical ratios have been studied in over a dozen bird species, the vast majority of which show similar thresholds to one another and monotonically increasing critical ratios for higher frequencies (by 2-3 dB/octave, similar to most mammals; reviewed by Dooling et al., 2000). By contrast, budgerigar critical ratios diverge markedly from other species at mid-to-high frequencies, with ~8 dB lower (more sensitive) thresholds from 3-4 kHz (Dooling & Saunders, 1975; Okanoya & Dooling, 1987; Farabaugh 1988; see Figs 5 & 6 in Okanoya & Dooling, 1987). The unusual critical-ratio function in budgerigars is not attributable to the audiogram and was hypothesized by Okanoya and Dooling (1987) to reflect specialized cochlear tuning or perhaps central processing mechanisms. A brief discussion of these studies will be added to the introduction, along with a new figure panel (for Fig. 1) illustrating these intriguing species differences in critical ratios.

      Another question was raised as to whether the simultaneous-masking paradigms and classic methods used to estimate behavioral tuning in budgerigars should be considered as valid, given newer forward-masking and notched-noise alternatives. We will expand the discussion of this issue in the revised manuscript. First, many of the methods from the classic budgerigar studies remain widely used in animal behavioral research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. While forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial, with one study showing a closer match of forward-masking results to auditory-nerve tuning (ferret: Sumner et al., 2018), but several others showing a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no real consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as essentially valid while also discussing limitations and remaining open to alternative interpretations.

      We will add clarification throughout the revision as to the specific behavioral measures used to quantify tuning in budgerigars (i.e., critical bands, psychophysical tuning curve, and critical ratios). This avoids potentially disparaging alternative behavioral methods that have not been tested. That the budgerigar behavioral data are “old” seems not particularly relevant considering that the methods are still used in animal behavioral research as noted previously. Rather, it seems important to clarify the specific behavioral techniques used to estimate budgerigar’s frequency tuning in the revised paper.

      Finally, we plan to add discussion of the apical-basal transition from the mammalian otoacoustic-emission literature, as suggested by reviewer 1, including how this concept might apply in budgerigars and other birds.

      References not already cited in the preprint:

      Burton JA, Dylla ME, Ramachandran R. Frequency selectivity in macaque monkeys measured using a notched-noise method. Hear Res. 2018 Jan;357:73-80. doi: 10.1016/j.heares.2017.11.012.

      King J, Insanally M, Jin M, Martins AR, D'amour JA, Froemke RC. Rodent auditory perception: Critical band limitations and plasticity. Neuroscience. 2015 Jun 18;296:55-65. doi: 10.1016/j.neuroscience.2015.03.053.

      Yost WA, Shofner WP. Critical bands and critical ratios in animal psychoacoustics: an example using chinchilla data. J Acoust Soc Am. 2009 Jan;125(1):315-23. doi: 10.1121/1.3037232. PMID: 19173418; PMCID: PMC2719489.

    1. Author response:

      (1) We do not know that the mechanism mediating the behavioral changes observed involves acetylcholine at all. (Reviewer 1)

      The reviewer rightly pointed out the co-release of acetylcholine (ACh) and GABA from cholinergic terminals. We believe that the detected behavioral changes are because of the augmentation of this innate mixed chemical signal. We agree that identifying the receptor specificity is an essential next step; however, addressing this point requires a currently unavailable research tool to block cholinergic receptors for a few hundred milliseconds. This temporal specificity is vital because acetylcholine is released in the medial prefrontal cortex (mPFC) on two distinct timescales, the slow release over tens of minutes from the task onset and the fast release time-locked to salient stimuli (TelesGrilo Ruivo et al., 2017). Moreover, the former slow signal is far more robust than the latter phasic signal. The pharmacological experiments suggested by the reviewer will suppress both the tonic and phasic signals, making it difficult to interpret the results. Given the rapid technological advancement in this field, we hope to investigate the underlying mechanisms in detail in the future. 

      (2) It is unclear whether mPFC cells are signaling predictions versus prediction errors. (Reviewer 2)

      As the reviewer pointed out, mPFC cells signal the prediction of imminent outcomes (Baeg et al., 2001; Mulder et al., 2003; Takehara-Nishiuchi and McNaughton, 2008; Kyriazi et al., 2020).

      However, the key difference between prediction signals and prediction error signals is their time course. The prediction signals begin to arise before the actual outcome occurs, whereas the prediction error signals are emitted after subjects experience the presence or absence of the expected outcome. In all our analyses, cell activity was normalized by the activity during the 1-second window before the threat site entry (i.e., the reveal of actual outcome; Lines 655-659). Also, all the statistical comparisons were made on the normalized activity during the 500-msec window, starting from the threat site entry (Lines 669670). Because this approach isolated the change in cell activity after the actual outcome, we interpret the data in Figure 4C as prediction error signals. 

      (3) The task does not fully dissociate place field coding. (Reviewer 2)

      The present analysis included several strategies to dissociate outcome selectivity from location selectivity (Figure 4). First, we collapsed cell activity on two threat sites to suppress the difference in cell activity between the sites. Second, our analysis compared how cell activity at the same location differed depending on whether outcomes were expected or surprising (Figure 4C). Nevertheless, we can use the present data to investigate the spatial tuning of mPFC cells. Indeed, an earlier version of this manuscript included some characterizations of spatial tuning. However, these data were deemed irrelevant and distracting when this manuscript was reviewed for publication in a different journal. As such, these data were removed from the current version. We are in the process of publishing another paper focusing on the spatial tuning of mPFC cells and their learning-dependent changes. 

      (4) The basic effects of cholinergic terminal stimulation on mPFC cell activity are unclear. (Reviewers 1, 3)

      We acknowledge the lack of characterization of the optogenetic manipulation of cholinergic terminals on mPFC cell activity outside the task context. As outlined in the discussion section (Lines 309-321), cholinergic modulation of mPFC cell activity is highly complex and most likely varies depending on behavioral states. In addition, because we intended to augment naturally occurring threatevoked cholinergic terminal responses (Tu et al., 2022), our optogenetic stimulation parameters were 3-5 times weaker than those used to evoke behavioral changes solely by the optogenetic stimulation of cholinergic terminals (Gritton et al., 2016). Based on these points, we validated the optogenetic stimulation based on its effects on air-puff-evoked cell activity during the task (Figure 2C, 2D). 

      (5) Some choices of statistical analyses are questionable (Reviewers 1, 3)

      We used the Kolmogorov-Smirnov (KS) test to investigate whether the distribution of cell responses differed between the two groups (Figure 2D) or changed with learning (Figure 3Ac, 3Bc). As seen in Figure 3Aa, some mPFC cells increased calcium activity in response to air-puffs, while others decreased. We expected that the manipulation or learning would alter these responses. If they are strengthened, the increased responses will become more positive, while the decreased responses will become more negative. If they are weakened, both responses will become closer to 0. Under such conditions, the shape of the distribution of cell response will change but not the median. The KS test can detect this, but not other tests sensitive to the difference in medians, such as Wilcoxon rank-sum tests. In Figure 2D, KS tests were applied to the independently sampled data from the control and ChrimsonRexpressing mice. In Figure 3Ac and 3Bc, we used all cells imaged in the first and fifth sessions. Considering that ~50% of them were longitudinally registered on both days, we acknowledge the violation in the assumption of independent sampling. In Figure 1D, we detected significant interaction between the group and sessions. Several approaches are appropriate to demonstrate the source of this interaction. We chose to conduct one-way ANOVA separately in each group to demonstrate the significant change in % adaptive choice across the sessions in the control group but not the ChrimsonR group. The cutoff for significance was adjusted with the Bonferroni correction in follow-up paired t-tests used in Figure 1F.

    1. Author response:

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      We wish to further clarify a striking aspect of puc-lacZ induction following injury: it is bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only 1-2 remaining boutons. This was particularly evident for injuries that spared the NMJ on muscle 29, which is comprised of only a few boutons. In some instances, only a single bouton was evident on muscle 29. While our injuries varied enormously in the number of branches and boutons that were lost, we did not see a comparable variability in puc-lacZ induction.  In the revision we will include additional images to better demonstrate this observation.

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation.

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question.

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation.

      References Cited:

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015–1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211–223.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank the reviewer for their comments and suggestions. We have made several edits to the paper to address these comments, including the addition of several new control experiments, corrections to mislabeled figures in Fig 2, and other additions to improve the clarity of several figures.

      This work is missing several controls that are necessary to substantiate their claims. My most important concern is that the optogenetic screen for neurons that alter pathogenic lawn occupancy does not have an accompanying control on non-pathogenic OP50 bacteria. Hence, it remains unclear whether these neuronal inhibition experiments lead to pathogen-specific or generalized lawn-leaving alterations. For strains that show statistical differences between - and + ATR conditions, the authors should perform follow-up validation experiments on non-pathogenic OP50 lawns to ensure that the observed effect is PA14-specific. Similarly, neuronal inhibition experiments in Figures 5E and H are only performed with naïve animals on PA14 - we need to see the latency to re-entry on OP50 as well, to make general conclusions about these neurons' role in pathogen-specific avoidance.

      We have added data from new control experiments to Fig. S1 (subfigures B, C) for both exit and re-entry dynamics on OP50. We find that inhibition of neurons produces different effects on both lawn entry and exit on PA14 compared to OP50. We observed that inhibition of neurons failed to change the re-entry dynamics for any of the lines which showed delayed latency to re-entry on PA14. Our results suggest that the neural control of re-entry dynamics we see are PA14 specific.

      My second major concern is regarding the calcium imaging experiments of candidate neurons involved in lawn re-entry behavior. Although the data shows that AIY, AVK, and SIA/SIB neurons all show reduced activity following pathogen exposure, the authors do not relate these activity changes to changes in behavior. Given the well-established links between these cells and forward locomotion, it is essential to not only report differences in activity but also in the relationship between this activity and locomotory behavior. If animals are paused outside of the pathogen lawn, these neurons may show low activity simply because the animals are not moving forward. Other forward-modulated neurons may also show this pattern of reduced activity if the animals remain paused. Given that the authors have recorded neural activity before and after contact with pathogenic bacteria in freely moving animals, they should also provide an analysis of the relationship between proximity to the lawn and the activity of these neurons.

      In response, we added an additional supplementary figure S7 to illustrate the role of each neuron in navigational control and added text to the discussion to better explain the role of each neuron type in the regulation of re-entry, in light of our previously published work on SIA in speed control.

      This work is missing methodological descriptions that are necessary for the correct interpretation of the results shown here. Figure 2 suggests that the determination of statistical significance across the optogenetic inhibition screen will be found in the Methods, but this information is not to be found there. At various points in the text, authors refer to "exit rate", "rate constant", and "entry rate". These metrics seem derived from an averaged measurement across many individual animals in one lawn evacuation assay plate. However "latency to re-entry" is only defined on a per-animal basis in the lawn re-exposure assay. These differences should be clearly stated in the methods section to avoid confusion and to ensure that statistics are computed correctly.

      Additional details have been added to the methods section to provide more in depth information on the statistical analysis performed. In brief, the latency to re-entry is calculated in the same way across all assays – re-entry events across replicate experiments for a given experimental condition are aggregated together and used to calculate relevant statistics.

      This work also contains mislabeled graphs and incorrect correspondence with the text, which make it difficult to follow the authors 'claims. The text suggests that Pdop-2::Arch3 and Pmpz-1::Arch3 show increased exit rates, whereas Figure 2 shows that Pflp-4::Arch3 but not Pmpz-1::Arch3 has increased exit rate. The authors should also make a greater effort to correctly and clearly label which type of behavioral experiment is used to generate each figure and describe the differences in experimental design in the main text, figure legends, and methods. Figure 2E depicts trajectories of animals leaving a lawn over a 2.5-minute interval but it is unclear when this time window occurs within the 18-hour lawn leaving assay. Likewise, Figure 2H depicts a 30-minute time window which has an unclear relationship to the overall time course of lawn leaving. This figure legend is also mislabeled as "Infected/Healthy", whereas it should be labeled "-/+ ATR".

      In Figures 2C and F, the x-axis labels are in a different order, making it difficult to compare between the 2 plots. Promoter names should be italicized. What does the red ring mean in Figure 2A? Figure 2 legend incorrectly states that four lines showed statistically significant changes for the Exist rate constant - only 2 lines are significant according to the figure.

      We thank the reviewer for identifying this embarrassing error. Figure 2C and F were flipped, and we have corrected this, we are sorry for the error. Promoter names have been italicized, and we have added additional text in the captions that the red ring is a ring light for background illumination of the worms. In addition, we have corrected the error in the figure legends from “Infected/Healthy” to “+/- ATR”.

      Lines in figure 2C and 2F are ordered by significance rather than keeping the same order in both. Majority feedback from colleagues suggested that this ordering was preferred.

      This work raises the interesting possibility that different sets of neurons control lawn exit and lawn re-entry behaviors following pathogen exposure. However, the authors never directly test this claim. To rigorously show this, the authors would need to show that lawn-exit-promoting neurons (CEPs, HSNs, RIAs, RIDs, SIAs) are dispensable for lawn re-entry behavior and that lawn re-entry promoting neurons (AVK, SIA, AIY, MI) are dispensable for lawn exit behavior in pathogen-exposed animals.

      We agree with the reviewer’s comments that there is insufficient evidence to show a complete decoupling of lawn exit and lawn re-entry. However, we note that our screen results show that only 1 line (dop-2) shows changes in both exit and re-entry dynamics upon neural inhibition (Fig. 2). This seems to suggest that at least some degree of neural control of re-entry is decoupled from exit.

      Please label graph axes with units in Figure 1 - instead of "Exit Rate" make it #exits per worm per hour, and make it more clear that Figures 1C and E have a different kind of assay than Figures 1A, B and D. There should be more consistency between the meaning of "pre/post" and "naive/infected/healthy" - and how many hours constitutes post.

      We have edited Figure 1 and made additions to the captions of figure 1 to make both points clearer. We have also standardized our language for subsequent figures (such as figure 5) to provide less ambiguity in pre/post and naïve/infected/healthy.

      Figure 5 - it should be made more clear when the stimulation/inhibition occurred in these experiments and how long they were recorded/analyzed.

      We have added additional details to the figure captions to make it clearer when the data was collected.

      Workspaces and code have been added under a data availability section in the manuscript.

      Reviewer 2:

      However, the paper's main weakness lies in its lack of a detailed mechanism explaining how the delayed reentry process directly influences the actual locomotor output that results in avoidance. The term 'delayed reentry' is used as a dynamic metric for quantifying the screening, yet the causal link between this metric and the mechanistic output remains unclear. Despite this, the study is well-structured, with comprehensive control experiments, and is very well constructed.

      We thank the reviewer for their comments and suggestions. We have added additional data and details to our work to cover these weaknesses, as can be seen in our responses to the suggestions below.

      (1) A key issue in the manuscript is the mechanistic link between the delayed process and locomotor output. AIY is identified as a crucial neuron in this process, but the specifics of how AIY influences this delay are not clear. For instance, does AIY decrease the reversal rate, causing animals to get into long-range search when they leave the bacterial lawn? Is there any relationship between pdf-2 expression and reversal rates? Given that AIY typically promotes long-range motion when activated, the suppression of this function and its implications on motion warrants further clarification.

      We have included additional data to explain how AIY might be able to regulate lawn entry behaviors and have added more to the discussion to explain how neural suppression might lead to changes in the behavior (new figure S7). Both AIY and SIA dynamics have been linked to worm navigation. In previous work (Lee 2019), we have demonstrated that SIA can control locomotory speed. Inhibition of SIA decreases locomotory speed, and as a result may serve to drive the increased latency of re-entry.

      AIY’s role in navigation has been previously established (Zhaoyu 2014), but we have added an additional supplementary figure and edited our discussion to further illustrate this point. As can be seen in the new figure S7, AIY neural activity undergoes a transition after removal from a bacterial lawn, going from low activity to high activity. This activity increase is correlated with a transition from a high reversal rate local search state to a long range search state characterized by longer runs. Inhibition of AIY during this long range search state increased the reversal rate resulting in a higher rate of re-orientations. This might serve as a part of the mechanistic explanation for AIY’s role in preventing lawn re-entry, as inhibition dramatically increased the rate of re-orientation, preventing worms from making directed runs into the bacterial lawn. However, there is an additional effect of the inhibition of AIY, not seen during food search. Inhibition of AIY in the context of a pathogenic bacterial lawn leads to stalling at the edge. Therefore, re-entry AIY could have an additional role in governing the animals movement, post exposure, upon contact with a pathogenic lawn.

      (2) I recommend including supplementary videos to visually demonstrate the process. These videos might help others identify aspects of the mechanism that are currently missing or unclear in the text.

      (4) The authors mention that the worms "left the lawn," but the images suggest that the worms do not stray far and remain around the perimeter. Providing videos could help clarify this observation and strengthen the argument by visually connecting these points

      Additional supplementary videos (1-3) taken at several stages of lawn evacuation have been added to visually demonstrate the process.

      (3) Regarding the control experiments (Figure 1E-G), the manuscript describes testing animals picked from a PA14-seeded plate and retesting them on different plates. It's crucial to clarify the differences between these plates. Specifically, the region just outside the lawn should be considered, as it is not empty and worms can spread bacteria around. Testing animals on a new plate with a pristine proximity region might introduce variables that affect their behavior.

      We have reworded the paper to make it clearer that these new conditions on a fresh PA14 lawn represent a different type of assay from the lawn evacuation assay. Fresh PA14 plates will indeed have a pristine proximity region compared to plates where the worms have spread the bacteria.

      These experiments were done to test if the evacuation effect is purely due to aversive signals left on the lawn or attractive signals left outside of the lawn. Given that worms are known to be able to leave compounds such as ascarosides to communicate with each other, we wanted to test that this lawn re-entry defect was not simply the result of deposited pheromones. Without any other method to remove such compounds, we relied on using fresh PA14 lawns instead to test this. We have updated the manuscript to clarify this point.

      (5) The manuscript notes that the PA14 strain was grown without shaking. Typically, growing this strain without agitation leads to biofilm formation. Clarifying whether there is a link between biofilm formation and avoidance behavior would add depth to the understanding of the experimental conditions and their impact on the observed behaviors.

      As the reviewer has noted, growth of PA14 without shaking might indeed lead to biofilm formation. This does represent a legitimate concern, as evidence from previous work has suggested that biofilm formation could be linked to pathogen avoidance as worms make use of mechanosensation to avoid pathogenic bacteria (Chang et al. 2011).  However, we do not observe substantial formation of biofilm in our cultured bacteria, likely since our growth time might be insufficient for sufficient biofilm formation to occur. We also note that our evacuation dynamics appear to be of similar timescale to results reported in previous work which used different growth conditions. As such, we believe that our growth conditions thus represent similar conditions as to those historically used in the lawn evacuation literature.

      Reviewer 3:

      Weaknesses:

      My only concern is that the authors should be more careful about describing their "compressed sensing-based approach". Authors often cite their previous Nature Methods paper, but should explain more because this method is critical for this manuscript. Also, this analysis is based on the hypothesis that only a small number of neurons are responsible for a given behavior. Authors should explain more about how to determine scarcity parameters, for example.

      We have added more details to our paper outlining some of the details involved in our compressed sensing approach. We go into more detail about how we chose sparsity parameters and note that our discovered neurons for re-entry appear to be robust over choice of sparsity parameters. These additional details can be found in both the paper body and the methods section.

      Line 45: This paragraph tries to mention that there should be "small sets of neurons" that can play key roles in integrating previous information to influence subsequent behavior. Is it valid as an assumption in the nervous systems?

      We want to clarify that what is important is not that there are ‘small sets of neurons’, but rather that these key neurons make up a small fraction of the total number of neurons in the nervous system. More correctly: the compressed sensing approach identifies information bottlenecks in the neural circuits, and the assumption is that the number of neurons in these bottlenecks are small. This is the underlying sparsity assumption being made here that allows us to utilize a compressed sensing based approach to identify these neurons. We have reworded this section to make it clear that what is important is not that the total number of neurons is small, but that they must be a small fraction of the total number of neurons in the nervous system.

      Line 125: "These approaches…" Authors repeatedly mentioned this statement to emphasize that their compressed sensing-based approach is the best choice. Are you really sure?

      We agree that there are several approaches that might allow for faster screening of the nervous system. For example, many studies approach the problem by looking at neurons with synapses onto a neuron already known to be implicated in the behavior or find neurons that express a key gene known to regulate the behavior of interest. These approaches utilize prior information to greatly reduce the pool of candidate neurons needed to be screened.

      In the absence of such prior information, we believe that our compressed sensing based approach allows a rapid way to perform an unbiased interrogation of the entire nervous system to identify key neurons at bottlenecks of neural circuits. Once these key neurons are identified, neurons upstream and downstream of these key neurons can be investigated in the future.  This approach gives us the added advantage of being able to identify neurons that do not connect to neurons that are already implicated in the behavior, or that don’t have clear genetic signatures in the behavior of interest. Our approach further allows for screening of neurons with no clear single genetic marker without the next to utilize intersectional genetic strategies.  We should not use the phrase “best choice” which might not be justified. We have reworded these statements, and we believe that compressed sensing based methods provide a complementary approach to those in the literature.

      Line 42: If authors refer to mushroom bodies and human hippocampus in relation to the significance of their work, authors should go back to these references in the Discussion and explain how their work is important.

      We thank the reviewer for this feedback, and we have added to our discussion to expand upon these points.

      Line 151: "the accelerated pathogen avoidance" Accelerated pathogen avoidance does not necessarily indicate the existence of the neural mechanism that inhibits the association of pathogenicity with microbe-specific cues (during early stages: first two hours).

      We agree with the reviewer’s statements that these results alone do not indicate the presence of an early avoidance mechanism. Other evidence for early avoidance mechanisms exists as seen in two choice assay experiments (Zhang 2005), and our results do seem to support this. However, we agree that early neural inhibition is insufficient evidence towards such a mechanism. We have thus removed this statement for accuracy.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus loss-of-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      We thank the Reviewer for their positive comments on our manuscript. In the revised version, we will try to improve readability, reduce jargon and avoid overstatements. 

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses / Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      Thank you for bringing up this point. For the tests shown in Figure 5, we indeed decided to match the srrm3 pairs by fish size since we thought this would be more comparable to the other lines both biologically and methodologically (in terms of video tracking, etc.). However, we are confident the results would be very similar if matched by age, since the differences in social interactions between the srrm3 homozygous mutants and their control siblings are very dramatic at any age. For example, this can be appreciated, in line with the Reviewer's suggestion, in Videos S2 and S3, which show groups of five 5 mpf fish that are either srrm3 mutants or controls. It can be observed that the behavior of 5 mpf control fish is very similar to those of 1 mpf fish pairs, with very small interindividual distances. We will nonetheless agree that this decision on the experimental design should be clearly stated in the text and figure legend and we will do so in the revised version.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      We thought it made the interpretation of plots easier, but we will change this in the revised version.

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      Yes, they are interchangeable. We will double check the wording to avoid confusion.

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      Apologies for this omission. We will briefly described the methods; however, please note that the data was obtained from a previous publication (Torres-Mendez et al, 2019), where the detailed methodology is reported.

      (5) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

      We will add further details to the revised version. With regards to the specific question, we would like to mention that this has not been a very common problem for the time points used in the manuscript (10 hap and 24 hap). At those stages, it was nearly always evident how to track each individual neurite. Dubious cases were simply discarded. Of course, such cases become much more common at later time points (48 and 72 hap), not sure in this study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

      The Reviewer is absolutely right and we apologize for this omission, since it was not our intention to imply that these social defects should be interpreted simply as autistic-like. It is indeed very likely that the main reason for the social alterations displayed by the srrm3's mutants are due to their impaired vision. We will add this discussion explicitly in the revised version. 

      Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

      Thank you very much for your positive comments to our manuscript.

    1. Author response:

      eLife Assessment

      This descriptive manuscript builds on prior research showing that the elimination of Origin Recognition Complex (ORC) subunits does not halt DNA replication. The authors use various methods to genetically remove one or two ORC subunits from specific tissues and observe continued replication, though it may be incomplete. The replication appears to be primarily endoreduplication, indicating that ORC-independent replication may promote genome reduplication without mitosis. Despite similar findings in previous studies, the paper provides convincing genetic evidence in mice that liver cells can replicate and undergo endoreduplication even with severely depleted ORC levels. While the mechanism behind this ORC-independent replication remains unclear, the study lays the groundwork for future research to explore how cells compensate for the absence of ORC and to develop functional approaches to investigate this process. The reviewers agree that this valuable paper would be strengthened significantly if the authors could delve a bit deeper into the nature of replication initiation, potentially using an origin mapping experiment. Such an exciting contribution would help explain the nature of the proposed new type of Mcm loading, thereby increasing the impact of this study for the field at large.<br />

      We appreciate the reviewers’ suggestion. Till now we know of only one paper that mapped origins of replication in regenerating mouse liver, and that was published two months back in Cell (PMID: 39293447).  We want to adopt this method, but we do not need it to answer the question asked.  We have mapped origins of replication in ORC-deleted cancer cell lines and compared to wild-type cells in Shibata et al., BioRXiv (PMID: 39554186) (it is under review).  We report the following:  Mapping of origins in cancer cell lines that are wild type or engineered to delete three of the subunits, ORC1, ORC2 or ORC5 shows that specific origins are still used and are mostly at the same sites in the genome as in wild type cells. Of the 30,197 origins in wild type cells (with ORC), only 2,466 (8%) are not used in any of the three ORC deleted cells and 18,319 (60%) are common between the four cell types. Despite the lack of ORC, excess MCM2-7 is still loaded at comparable rates in G1 phase to license reserve origins and is also repeatedly loaded in the same S phase to permit re-replication. 

      Citation: Specific origin selection and excess functional MCM2-7 loading in ORC-deficient cells. Yoshiyuki Shibata, Mihaela Peycheva, Etsuko Shibata, Daniel Malzl, Rushad Pavri, Anindya Dutta. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186)

      Public Reviews:

      Reviewer #1 (Public review):

      The origin recognition complex (ORC) is an essential loading factor for the replicative Mcm2-7 helicase complex. Despite ORC's critical role in DNA replication, there have been instances where the loss of specific ORC subunits has still seemingly supported DNA replication in cancer cells, endocycling hepatocytes, and Drosophila polyploid cells. Critically, all tested ORC subunits are essential for development and proliferation in normal cells. This presents a challenge, as conditional knockouts need to be generated, and a skeptic can always claim that there were limiting but sufficient ORC levels for helicase loading and replication in polyploid or transformed cells. That being said, the authors have consistently pushed the system to demonstrate replication in the absence or extreme depletion of ORC subunits.

      Here, the authors generate conditional ORC2 mutants to counter a potential argument with prior conditional ORC1 mutants that Cdc6 may substitute for ORC1 function based on homology. They also generate a double ORC1 and ORC2 mutant, which is still capable of DNA replication in polyploid hepatocytes. While this manuscript provides significantly more support for the ability of select cells to replicate in the absence or near absence of select ORC subunits, it does not shed light on a potential mechanism.

      The strengths of this manuscript are the mouse genetics and the generation of conditional alleles of ORC2 and the rigorous assessment of phenotypes resulting from limiting amounts of specific ORC subunits. It also builds on prior work with ORC1 to rule out Cdc6 complementing the loss of ORC1.

      The weakness is that it is a very hard task to resolve the fundamental question of how much ORC is enough for replication in cancer cells or hepatocytes. Clearly, there is a marked reduction in specific ORC subunits that is sufficient to impact replication during development and in fibroblasts, but the devil's advocate can always claim minimal levels of ORC remaining in these specialized cells.

      The significance of the work is that the authors keep improving their conditional alleles (and combining them), thus making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC. This work lays the foundation for future functional screens to identify other factors that may modulate the response to the loss of ORC subunits.

      This work will be of interest to the DNA replication, polyploidy, and genome stability communities.

      Thank you.

      Reviewer #2 (Public review):

      This manuscript proposes that primary hepatocytes can replicate their DNA without the six-subunit ORC. This follows previous studies that examined mice that did not express ORC1 in the liver. In this study, the authors suppressed expression of ORC2 or ORC1 plus ORC2 in the liver.

      Comments:

      (1) I find the conclusion of the authors somewhat hard to accept. Biochemically, ORC without the ORC1 or ORC2 subunits cannot load the MCM helicase on DNA. The question arises whether the deletion in the ORC1 and ORC2 genes by Cre is not very tight, allowing some cells to replicate their DNA and allow the liver to develop, or whether the replication of DNA proceeds via non-canonical mechanisms, such as break-induced replication. The increase in the number of polyploid cells in the mice expressing Cre supports the first mechanism, because it is consistent with few cells retaining the capacity to replicate their DNA, at least for some time during development.

      In our study, we used EYFP as a marker for Cre recombinase activity. ~98% of the hepatocytes in tissue sections and cells in culture express EYFP, suggesting that the majority of hepatocytes successfully expressed the Cre protein to delete the ORC1 or ORC2 genes. To assess deletion efficiency, we employed sensitive genotyping and Western blotting techniques to confirm the deletion of ORC1 and ORC2 in hepatocytes isolated from Alb-Cre mice. Results in Fig. 2C and Fig. 6D demonstrate the near-complete absence of ORC2 and ORC1 proteins, respectively, in these hepatocytes.

      The mutant hepatocytes underwent at least 15–18 divisions during development. The inherited ORC1 or ORC2 protein present during the initial cell divisions, would be diluted to less than 1.5% of wild-type levels within six divisions, making it highly unlikely to support DNA replication, and yet we observe hepatocyte numbers that suggest there was robust cell division even after that point.

      Furthermore, the EdU incorporation data confirm DNA synthesis in the absence of ORC1 and ORC2. Specifically, immunofluorescence showed that both in vitro and in vivo, EYFP-positive hepatocytes (indicating successful ORC1 and ORC2 deletion) incorporated EdU, demonstrating that DNA synthesis can occur without ORC1 and ORC2.

      Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      (2) Fig 1H shows that 5 days post infection, there is no visible expression of ORC2 in MEFs with the ORC2 flox allele. However, at 15 days post infection, some ORC2 is visible. The authors suggest that a small number of cells that retained expression of ORC2 were selected over the cells not expressing ORC2. Could a similar scenario also happen in vivo?

      This would not explain the significant incorporation of EdU in hepatocytes that do not have detectable ORC by Western blots and that are EYFP positive.  Also note that for MEFs we are delivering the Cre by AAV infection in vitro, so there is a finite probability that a cell will not receive Cre and will not delete ORC2.  However, in vivo, the Alb-Cre will be expressed in every cell that turns on albumin.  There is no escaping the expression of Cre.

      (3) Figs 2E-G shows decreased body weight, decreased liver weight and decreased liver to body weight in mice with recombination of the ORC2 flox allele. This means that DNA replication is compromised in the ALB-ORC2f/f mice.

      It is possible that DNA replication is partially compromised or may slow down in the absence of ORC2. However, it is important to emphasize that livers with ORC2 deletion remain capable of DNA replication, so much so that liver function and life span are near normal. Therefore, some kind of DNA replication has to serve as a compensatory mechanism in the absence of ORC2 to maintain liver function and support regeneration.

      (4) Figs 2I-K do not report the number of hepatocytes, but the percent of hepatocytes with different nuclear sizes. I suspect that the number of hepatocytes is lower in the ALB-ORC2f/f mice than in the ORC2f/f mice. Can the authors report the actual numbers?

      We show in Table 2 that the Alb-Orc2f/f mice have about 25-37.5% of the hepatocytes compared to the WT mice.

      (5) Figs 3B-G do not report the number of nuclei, but percentages, which are plotted separately for the ORC2-f/f and ALB-ORC2-f/f mice. Can the authors report the actual numbers?

      In all the FACS experiments in Fig. 3B-G we collect data for a total of 10,000 nuclei (or cells).  For Fig. 3E-G we divide the 10,000 nuclei into the bottom 40% on the EYFP axis (EYFP low, which is mostly EYFP negative) as the control group, and EYFP high (top 20% on the EYFP axis) test group.  We will mention this in the revision and label EYFP negative and positive as EYFP low and high.

      (6) Fig 5 shows the response of ORC2f/f and ALB-ORC2f/f mice after partial hepatectomy. The percent of EdU+ nuclei in the ORC2-f/f (aka ALB-CRE-/-) mice in Fig 5H seems low. Based on other publications in the field it should be about 20-30%. Why is it so low here? The very low nuclear density in the ALB-ORC2-f/f mice (Fig 5F) and the large nuclei (Fig 5I) could indicate that cells fire too few origins, proceed through S phase very slowly and fail to divide.

      The percentage of EdU+ nuclei in the ORC2f/f without Alb-Cre mice is 8%, while in PMID 10623657, the 10% of wild type nuclei incorporate  EdU at 42 hr post partial hepatectomy (mid-point between the 36-48 hr post hepatectomy that was used in our study).  The important result here is that in the ORC2f/f mice with Alb-Cre (+/-) we are seeing significant EdU incorporation. We will also correct the X-axis labels in 5F, 5I, 7E and 7F to reflect that those measurements were not made at 36 hr post-resection but later (as was indicated in the schematic in Fig. 5A).

      (7) Fig 6F shows that ALB-ORC1f/f-ORC2f/f mice have very severe phenotypes in terms of body weight and liver weight (about on third of wild-type!!). Fig 6H and 6I, the actual numbers should be presented, not percentages. The fact that there are EYFP negative cells, implies that CRE was not expressed in all hepatocytes.

      The liver to body weight ratio is what one has to look at, and it is 70% of the WT.  In females the liver and body weight are low (although in proportion to each other), which maybe is what the reviewer is talking about.  However, the fact that liver weight and body weight are not as low in males, suggest that this is a gender (hormone?) specific effect and not a DNA replication defect.  We have another paper also in BioRXiv (Su et al.) that suggests that ORC subunits have significant effect on gene expression, so it is possible that that is what leads to this sexual dimorphism in phenotype.

      The bottom 40% of nuclei on the EYFP axis in the FACS profiles (what was labeled EYFP negative but will now be called EYFP low) contains mostly non-hepatocytes that are genuinely EYFP negative.   Non-hepatocytes (bile duct cells, endothelial cells, Kupffer cells, blood cells) are a significant part of cells in the dissociated liver (as can be seen in the single cell sequencing results in PMID: 32690901).  Their presence does not mean that hepatocytes are not expressing Cre.  Hepatocytes mostly are EYFP positive, as can be seen in the tissue sections (where the hepatocytes take up most of visual field) and in cells in culture.  Also if there are EYFP negative hepatocyte nuclei in the FACS, that still does not rule out EYFP presence in the cytoplasm.  The important point from the FACS is that the EYFP high nuclei (which have expressed Cre for the longest period) are polyploid relative to the EYFP low nuclei.

      (8) Comparing the EdU+ cells in Fig 7G versus 5G shows very different number of EdU+ cells in the control animals. This means that one of these images is not representative. The higher fraction of EdU+ cells in the double-knockout could mean that the hepatocytes in the double-knockout take longer to complete DNA replication than the control hepatocytes. The control hepatocytes may have already completed DNA replication, which can explain why the fraction of EdU+ cells is so low in the controls. The authors may need to study mice at earlier time points after partial hepatectomy, i.e. sacrifice the mice at 30-32 hours, instead of 40-52 hours.

      The apparent difference that the reviewer comments on stems from differences in nuclear density in the images in Fig. 7G and 5G (also quantitated in Fig. 7F and 5F).  The quantitation in Fig. 7H and 5H show that the % of EdU plus cells are comparable (5-8%). 

      (9) Regarding the calculation of the number of cell divisions during development: the authors assume that all the hepatocytes in the adult liver are derived from hepatoblasts that express Alb. Is it possible to exclude the possibility that pre-hepatoblast cells that do not express Alb give rise to hepatocytes? For example the cells that give rise to hepatoblasts may proliferate more times than normal giving rise to a higher number of hepatoblasts than in wild-type mice.

      Single cell sequencing of mouse liver at e11 shows hepatoblasts expressing hepatocyte specific markers (PMID: 32690901).  All the cells annotated from the single-cell seq analysis are differentiated cells arguing against the possibility that undifferentiated endodermal cells (what the reviewer probably means by pre-hepatoblasts) exist at e11.  The following review (https://www.ncbi.nlm.nih.gov/books/NBK27068/) says: “The differentiation of bi-potential hepatoblasts into hepatocytes or BECs begins around e13 of mouse development. Initially hepatoblasts express genes associated with both adult hepatocytes (Hnf4α, Albumin) ...”  Thus, we can be certain that undifferentiated endodermal cells are unlikely to persist on e11 and that hepatoblasts at e11 express albumin.  Our calculation of number of cell divisions in Table 2 begins from e12.

      The reviewer maybe suggesting that ORC deletion leads to the immediate demise of hepatoblasts (despite having inherited ORC protein from the endodermal cells) causing undifferentiated endodermal cells to persist and proliferate much longer than in normal development.  We consider it unlikely, but if true it will be amazing new biology, both by suggesting that deletion of ORC immediately leads to the death of the hepatoblasts (despite a healthy reserve of inherited ORC protein) and by suggesting that there is a novel feedback mechanism from the death/depletion of hepatoblasts leading to the persistence and proliferation of undifferentiated endodermal cells.

      (10) My interpretation of the data is that not all hepatocytes have the ORC1 and ORC2 genes deleted (eg EYFP-negative cells) and that these cells allow some proliferation in the livers of these mice.

      Please see the reply in question #1.  Particularly relevant: “Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      Reviewer #3 (Public review):

      Summary:

      The authors address the role of ORC in DNA replication and that this protein complex is not essential for DNA replication in hepatocytes. They provide evidence that ORC subunit levels are substantially reduced in cells that have been induced to delete multiple exons of the corresponding ORC gene(s) in hepatocytes. They evaluate replication both in purified isolated hepatocytes and in mice after hepatectomy. In both cases, there is clear evidence that DNA replication does not decrease at a level that corresponds with the decrease in detectable ORC subunit and that endoreduplication is the primary type of replication observed. It remains possible that small amounts of residual ORC are responsible for the replication observed, although the authors provide arguments against this possibility. The mechanisms responsible for DNA replication in the absence of ORC are not examined.

      Strengths:

      The authors clearly show that there are dramatic reductions in the amount of the targeted ORC subunits in the cells that have been targeted for deletion. They also provide clear evidence that there is replication in a subset of these cells and that it is likely due to endoreduplication. Although there is no replication in MEFs derived from cells with the deletion, there is clearly DNA replication occurring in hepatocytes (both isolated in culture and in the context of the liver). Interestingly, the cells undergoing replication exhibit enlarged cell sizes and elevated ploidy indicating endoreduplication of the genome. These findings raise the interesting possibility that endoreduplication does not require ORC while normal replication does.

      Weaknesses:

      There are two significant weaknesses in this manuscript. The first is that although there is clearly robust reduction of the targeted ORC subunit, the authors cannot confirm that it is deleted in all cells. For example, the analysis in Fig. 4B would suggest that a substantial number of cells have not lost the targeted region of ORC2. Although the western blots show stronger effects, this type of analysis is notorious for non-linear response curves and no standards are provided. The second weakness is that there is no evaluation of the molecular nature of the replication observed. Are there changes in the amount of location of Mcm2-7 loading that is usually mediated by ORC? Does an associated change in Mcm2-7 loading lead to the endoreduplication observed? After numerous papers from this lab and others claiming that ORC is not required for eukaryotic DNA replication in a subset of cells, we still have no information about an alternative pathway that could explain this observation.

      We do not see a significant deficit in MCM2-7 loading (amount and rate) in cancer cell lines where we have deleted ORC1, ORC2 or ORC5 genes separately in Shibata et al. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186)

      The authors frequently use the presence of a Cre-dependent eYFP expression as evidence that the ORC1 or ORC2 genes have been deleted. Although likely the best visual marker for this, it is not demonstrated that the presence of eYFP ensures that ORC2 has been targeted by Cre. For example, based on the data in Fig. 4B, there seems to be a substantial percentage of ORC2 genes that have not been targeted while the authors report that 100% of the cells express eYFP.

      The PCR reactions in Fig. 4B are still contaminated by DNA from non-hepatocyte cells:  bile duct cells, endothelial, Kupfer cells and blood cells.  Under the microscope  culture we can recognize the hepatocytes unequivocally from their morphology. <2% of the hepatocyte cells in culture in Fig. 4C are EYFP-.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      We have rewritten the introduction of the manuscript and clearly stated the study questions we were aiming for:

      In paragraph 1-we have stated clearly that we need to study why ADC type of cervical cancer is more aggressive. (Line 58 - 77)

      In paragraph 2- we have stated clearly that we need to find valuable biomarkers to help diagnose lymph node metastasis, which may compensate the shortage of radiological imaging tools and reduce the rate of misdiagnosis. (Line 78 - 100)

      In paragraph 3- we have stated clearly that HPV negative cases is a special group of cervical cancer and we aim to study its cellular features. (Line 101 - 108)

      In paragraph 4- we have stated clearly that we need to decode cell-to-cell interaction mode in the tumor immune microenvironment of ADC using scRNA-seq. (Line 109 - 123)

      (2) For the sequencing, which kit was used on the Novaseq6000?

      For sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and have already add the information in Methods section. (Line 196- 197)

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      We apologize for the inadequacy of descriptions of data analysis process. We have already provided a new part of “data processing” with more details in the Methods section (Line 202 - 221). In addition, we have also provided annotated copies of scripts in the supplementary data as Supplementary Data 1.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      We have already added the list of marker genes for cell type annotation in the revised manuscript as Supplementary Table 3.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      We feel sorry for lacking statistics when performing analyses of comparisons. In the revision, we have already used statistic approaches to analyze the differences between each set of group comparison. As a result, the corresponding figures have been revised, accordingly.

      For examle, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D had been re-analyzed to compare ADC/SCC;Supplementary Fig. 1A, Supplementary Fig. 2A, Supplementary Fig. 4A, Supplementary Fig. 5A, Supplementary Fig. 6A had been re-analyzed to compare HPV+/HPV-; Supplementary Fig. 1B, Supplementary Fig. 2B, Supplementary Fig. 4B, Supplementary Fig. 5B, Supplementary Fig. 6B had been re-analyzed to compare Early/Late stage. All P values have been listed in the figure legends.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      We feel sorry for impreciseness when presenting histograms of Fig. 2D and we have also revised other figures with similar mistakes, such as Fig. 1F,  Fig. 5D. As for the width of bars, which is due to output style of data processing, we have already corrected all similar mistakes alongside the whole manuscript, for example, Fig. 2D and Supplementary Fig. 2A-B.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Thank you for your insightful comments. As you noted, several conclusions were initially based on bioinformatics predictions. Thus in the revised manuscript, we have rewritten all relevant descriptions in a more softened way, particularly in the paragraph of “epithelial cells” in Results section, as well as the conclusions derived from bioinformatics predictions in other paragraphs throughout the manuscript. We hope our revised descriptions will enhance the precision of our work.

      For example, in paragraph “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features (from Line 353)”, many over-affirmative disriptions had been re-written in Line 353, 362, 371, 375, 379, 383, 390, 392. From Line 395 to 399, the conclusion had been revised as “The observation of cluster Epi_10_CYSTM1 and its possible specificity to ADC makes us question whether or not it may be related to the aggressiveness of ADC” compared to the previous “This observation may partially indicate that high stemness cluster Epi_10_CYSTM1 is essential for ADC to present more aggressive features”. From Line 400 to 408, conclusions from GO analyses had also been rewritten.

      In paragraph “ADC-specific epithelial cluster-derived gene SLC26A3 is a potential prognostic marker for lymph node metastasis (from Line 422)”, many conclusions based on predictions had been revises, such as Line 424 - 428, Line 439 - 441, Line 451 - 453, Line 455 - 457, Line 458 - 459, Line 471 - 473, Line 478 - 481, Line 484 - 486, Line 489, etc.

      In paragraph “Tumor associated neutrophils (TANs) surrounding ADC tumor area may contribute to the formation of a malignant microenvironment (from Line 536)”, we have changed the descriptions based on bio-infomative predictions, such as Line 560, Line 561, Line 565, Line 566, Line 572, Line 576 - 577, etc.

      In paragraph “Crosstalk among tumor cells, Tregs and neutrophils establishes the immunosuppressive TIME in ADC (from Line 601)”, we have already corrected the all the affirmative descriptions, such as Line 604, Line 612, Line 614, Line 626, Line 628 - 629, Line 641, Line 654 – 655, etc.

      All the changes have also been listed in Revision Notes in detail.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      We appreciate this suggestion. We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we have detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC patients and highlights its potential as a diagnostic marker for clinical staging. To further investigate whether this cluster is generally existing in ADC patients, we have identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on a total of 56 tissue samples, and the results showed positive expressions of these markers in the majority of stage IIIC tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section (From Line 801).

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Thank you for your insightful comment. From the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. As a result, we have already deleted this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      We feel thankful for this question. The conclusion that “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We feel sorry for this and have already corrected the description into “As one of stage IIIC-specific cell clusters, the cluster of Epi_10_CYSTM1, with its representative marker gene SLC26A3, presents potential diagnostic value to predict lymph node metastasis” from Line 478-481.

      However, based on our results, we do think this cluster is a potential diagnostic marker and the hypothesis is right. As for SLC26A3, we have specifically added a new paragraph (from Line 801 - 822) in Discussion section to discuss the rationality and necessity of selecting this gene as our central focus, and the reasons why SLC26A3 should be the representative of cluster Epi_10_CYSTM1. As you noted, SLC26A3 appears to be broadly expressed in later tumors rather than restricted to a minor subset in the images. We apologize for any misunderstanding caused. When presenting the IHC data, we only showed the strongly positive areas of each slide to emphasize the differences. In our revision, we have included whole slide scanning images of the IHC samples, clearly showing that SLC26A3 is restricted to a part of the tumors (Supplementary Fig.9).

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      We apologize for using data without noticing the contamination of T cells with few epithelial cells. We have re-performed quality control to exclude contamination and re-analyzed all data of T cells. In the reviesed manuscript, we have therefore updated completely new data for T cells in both Fig. 4 and Supplementary Fig. 4.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Our initial purpose was to use GO analysis as supports for our conclusions. However, we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we have already deleted GO data and descriptions in the paragraphs of “T cell (Fig.4)”(from Line 495) and “B/plasma cell (Fig.6)” (from Line 579), because the predictions are quite irrelevant to our conclusions.

      However, in the sections of “epithelial cell (Fig.2)” (from Line 352) and “neutrophils (Fig.5)” (from Line 536), we retained the GO data and rewrote the conclusions, because these analyses have provided us with valuable information regarding the role of specific cell clusters in ADC progression. Furthermore, our subsequent analyses, such as CellChat, have further validated the accuracy of the findings from the GO analysis. We do think this logically supports the whole storyline of the study.

      Reviewer #2 (public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      We feel sorry that many of the conclusions have been written in an over-affirmative way but lack profound supporting evidences. In our revision, we have already optimized the writing techniques and re-written all conclusions or descriptions related to only bio-informatic predictions. Moreover, we have performed statistical re-analyses on all data and rearranged the related figures.

      For example, in Line 352, we have changed the sub-title “The sub-clusters of epithelial cells exhibit elevated stem-like features to promote the aggressiveness of ADC” into “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features”. In this paragraph, many over-affirmative discriptions such as “exclusively”, “significant”, “overwhelmingly”, “remarkably” have been deleted. From Line 486-493, the conclusion of “Moreover, SLC26A3 could be employed as a marker for the Epi_10_CYSTM1 cluster, aiding in the diagnosis of lymph node metastasis to prevent post-surgical upstaging in ADC patients in the future” have been changed into “our results propose that SLC26A3 might be considered as a diagnostic marker to predict lymph node metastasis in ADC patients”. Similar over-affirmative descriptions and conclusions had also been re-written in the other paragraphs, which has been refered to question (7) above.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      We sincerely feel grateful for this question. This is a quite important question as it is also pointed out by reviewer#1 in question (8) above. In the revised manuscript, we have already optimized our descriptions and have added detailed explanation for the importance of SLC26A3 in the Discussion section  (from Line 802 - 823). We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. To further investigate whether this cluster is generally present in ADC patients, we identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on 56 cases of tissue samples, and the results showed positive expression of these markers in the majority of stage III tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Thank you for your insightful comment. This important point is also raised by reviewer#1 above. In the revised manuscript, we have reanalyzed our scRNA-seq data and listed the canonical marker genes for cell type annotation. Most importantly, as for T cells and its sub-clustering, we have performed quality control and re-analyzed all data for T cells, with contamination excluded. In the reviesed manuscript, we have added the re-analyzed data for T cells in both Fig. 4 and Supplementary Fig. 4.

      Recommendations for the authors:

      Reviewer #1 (recommendations for the authors):

      The text would substantially benefit from an editorial revision of language usage.

      We sincerely feel grateful for this suggestion. In our revision, we have conducted language editing and carefully rewritten our manuscript. The changes have been clearly marked in the tracked version of the revised manuscript.

      Reviewer #2 (recommendations for the authors):

      (1) Use statistical approaches to claim enrichment/specificity of populations to given groups (ADC, HPV, etc). Analysis packages like Milo for differential abundance testing would be very helpful.

      We feel grateful for this suggestion. In our revision, we have performed statistical analyses for all groups of comparison data. Meanwhile, we have rearranged the figures based on these statistical results, for example, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D, Supplementary Fig. 1A-B, Supplementary Fig. 2A-B, Supplementary Fig. 4A-B, Supplementary Fig. 5A-B, Supplementary Fig. 6A-B.

      (2) In the subclustering, consider a round of quality control to ensure that all cells are of the cell type they are claimed to be. Contaminant clusters/cells could be filtered out or reassigned. This could be supplemented with an automated annotation approach using cell-type references.

      We feel thankful for this suggestion. As a result, we have provided copies of scripts in the supplementary data to ensure the quality control of cell type annotation.

      (3) An explanation for why SLC26A3 is so rare in the scRNA-seq data, but seemingly common in the IHC staining would be helpful. I am concerned about the specificity of the stain.

      We apologize for lacking adequate explanation of SLC26A3 and cluster Epi_10_CYSTM1. This is a quite crucial question as it has been listed above in question (8) of reviewer #1 and question (2) of reviewer #2 (public review section). In the revised manuscript, we have added intenstive discussion about this question in the seventh paragraph of Disccusion section (from Line 801 - 822). In fact, because of the heterogeneity among different individuals and different tumor regions even within one sample, Epi_10_CYSTM1 seemed to be derived from only one sample. However, the relatively high number of cells in this cluster from one late-stage (stage IIIC) patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. Furthermore, we have identified SLC26A3, ORM1 and ORM2 as specific markers of this cluser and performed IHC staining. With a positive expression of these markers, the existence of this cluster has been indirectly proved (as shown in Fig. 3B).

    1. Author response:

      The following is the authors’ response to the current reviews.

      The authors agree with the reviewers that future studies are needed to dissect the mechanisms of eIF3 binding to 3'UTRs and their impact on translation, and the impact of this binding on cellular fate.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reveals extensive binding of eukaryotic translation initiation factor 3 (eIF3) to the 3' untranslated regions (UTRs) of efficiently translated mRNAs in human pluripotent stem cell-derived neuronal progenitor cells. The authors provide solid evidence to support their conclusions, although this study may be enhanced by addressing potential biases of techniques employed to study eIF3:mRNA binding and providing additional mechanistic detail. This work will be of significant interest to researchers exploring post-transcriptional regulation of gene expression, including cellular, molecular, and developmental biologists, as well as biochemists.

      We thank the reviewers for their positive views of the results we present, along with the constructive feedback regarding the strengths and weaknesses of our manuscript, with which we generally agree. We acknowledge our results will require a deeper exploration of the molecular mechanisms behind eIF3 interactions with 3'-UTR termini and experiments to identify the molecular partners involved. Additionally, given that NPC differentiation toward mature neurons is a process that takes around 3 weeks, we recognize the importance of examining eIF3-mRNA interactions in NPCs that have undergone differentiation over longer periods than the 2-hr time point selected in this study. Finally, considering the molecular complexity of the 13subunit human eIF3, we agree that a direct comparison between Quick-irCLIP and PAR-CLIP will be highly beneficial and will determine whether different UV crosslinking wavelengths report on different eIF3 molecular interactions. Additional comments are given below to the identified weaknesses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors perform irCLIP of neuronal progenitor cells to profile eIF3-RNA interactions upon short-term neuronal differentiation. The data shows that eIF3 mostly interacts with 3'-UTRs - specifically, the poly-A signal. There appears to be a general correlation between eIF3 binding to 3'-UTRs and ribosome occupancy, which might suggest that eIF3 binding promotes protein

      Strengths:

      The study provides a wealth of new data on eIF3-mRNA interactions and points to the potential new concept that eIF3-mRNA interactions are polyadenylation-dependent and correlate with ribosome occupancy.

      Weaknesses:

      (1) A main limitation is the correlative nature of the study. Whereas the evidence that eIF3 interacts with 3-UTRs is solid, the biological role of the interactions remains entirely unknown. Similarly, the claim that eIF3 interactions with 3'-UTR termini require polyadenylation but are independent of poly(A) binding proteins lacks support as it solely relies on the absence of observable eIF3 binding to poly-A (-) histone mRNAs and a seeming failure to detect PABP binding to eIF3 by co-immunoprecipitation and Western blotting. In contrast, LC-MS data in Supplementary File 1 show ready co-purification of eIF3 with PABP.

      We agree the molecular mechanisms underlying the crosslinking between eIF3 and the end of mRNA 3’-UTRs remains to be determined. We also agree that the lack of interaction seen between eIF3 and PABP in Westerns, even from HEK293T cells, is a puzzle. The low sequence coverage in the LC-MS data gave us pause about making a strong statement that these represent direct eIF3 interactions, given the similar background levels of some ribosomal proteins.

      (2) Another question concerns the relevance of the cellular model studied. irCLIP is performed on neuronal progenitor cells subjected to neuronal induction for 2 hours. This short-term induction leads to a very modest - perhaps 10% - and very transient 1-hour-long increase in translation, although this is not carefully quantified. The cellular phenotype also does not appear to change and calling the cells treated with differentiation media for 2 hours "differentiated NPCs" seems a bit misleading. Perhaps unsurprisingly, the minor "burst" of translation coincides with minor effects on eIF3-mRNA interactions most of which seem to be driven by mRNA levels. Based on the ~15-fold increase in ID2 mRNA coinciding with a ~5-fold increase in ribosome occupancy (RPF), ID2 TE actually goes down upon neuronal induction.

      We agree that it will be interesting to look at eIF3-mRNA interactions at longer time points after induction of NPC differentiation. However, the pattern of eIF3 crosslinking to the end of 3’-UTRs occurs in both time points reported here, which is likely to be the more general finding in what we present.

      (3) The overlap in eIF3-mRNA interactions identified here and in the authors' previous reports is minimal. Some of the discrepancies may be related to the not well-justified approach for filtering data prior to assessing overlap. Still, the fundamentally different binding patterns - eIF3 mostly interacting with 5'-UTRs in the authors' previous report and other studies versus the strong preference for 3'-UTRs shown here - are striking. In the Discussion, it is speculated that the different methods used - PAR-CLIP versus irCLIP - lead to these fundamental differences. Unfortunately, this is not supported by any data, even though it would be very important for the translation field to learn whether different CLIP methodologies assess very different aspects of eIF3-mRNA interactions.

      We agree the more interesting aspect of what we observe is the difference in location of eIF3 crosslinking, i.e. the end of 3’-UTRs rather than 5’-UTRs or the pan-mRNA pattern we observed in T cells. The reviewer is right that it will be important in the future to compare PAR-CLIP and Quick-irCLIP side-by-side to begin to unravel the differences we observe with the two approaches.

      Reviewer #2 (Public review):

      Summary:

      The paper documents the role of eIF3 in translational control during neural progenitor cell (NPC) differentiation. eIF3 predominantly binds to the 3' UTR termini of mRNAs during NPC differentiation, adjacent to the poly(A) tails, and is associated with efficiently translated mRNAs, indicating a role for eIF3 in promoting translation.

      Strengths:

      The manuscript is strong in addressing molecular mechanisms by using a combination of nextgeneration sequencing and crosslinking techniques, thus providing a comprehensive dataset that supports the authors' claims. The manuscript is methodologically sound, with clear experimental designs.

      Weaknesses:

      (1) The study could benefit from further exploration into the molecular mechanisms by which eIF3 interacts with 3' UTR termini. While the correlation between eIF3 binding and high translation levels is established, the functionality of these interactions needs validation. The authors should consider including experiments that test whether eIF3 binding sites are necessary for increased translation efficiency using reporter constructs.

      We agree with the reviewer that the molecular mechanism by which eIF3 interacts with the 3’UTR termini remains unclear, along with its biological significance, i.e. how it contributes to translation levels. We think it could be useful to try reporters in, perhaps, HEK293T cells in the future to probe the mechanism in more detail.

      (2) The authors mention that the eIF3 3' UTR termini crosslinking pattern observed in their study was not reported in previous PAR-CLIP studies performed in HEK293T cells (Lee et al., 2015) and Jurkat cells (De Silva et al., 2021). They attribute this difference to the different UV wavelengths used in Quick-irCLIP (254 nm) and PAR-CLIP (365 nm with 4-thiouridine). While the explanation is plausible, it remains a caveat that different UV crosslinking methods may capture different eIF3 modules or binding sites, depending on the chemical propensities of the amino acid-nucleotide crosslinks at each wavelength. Without addressing this caveat in more detail, the authors cannot generalize their findings, and thus, the title of the paper, which suggests a broad role for eIF3, may be misleading. Previous studies have pointed to an enrichment of eIF3 binding at the 5' UTRs, and the divergence in results between studies needs to be more explicitly acknowledged.

      We agree with the reviewer that the two methods of crosslinking will require a more detailed head-to-head comparison in the future. However, we do think the title is justified by the fact that we see crosslinking to the termini of 3’-UTRs across thousands of transcripts in each condition. Furthermore, the 3’-UTR crosslinking is enriched on mRNAs with higher ribosome protected fragment counts (RPF) in differentiated cells, Figure 3F.

      (3) While the manuscript concludes that eIF3's interaction with 3' UTR termini is independent of poly(A)-binding proteins, transient or indirect interactions should be tested using assays such as PLA (Proximity Ligation Assay), which could provide more insights.

      This is a good idea, but would require a substantial effort better suited to a future publication. We think our observations are interesting enough to the field to stimulate future experimentation that we may or may not be most capable of doing in our lab.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Mestre-Fos and colleagues, authors have analyzed the involvement of eIF3 binding to mRNA during differentiation of neural progenitor cells (NPC). The authors bring a lot of interesting observations leading to a novel function for eIF3 at the 3'UTR.

      During the translational burst that occurs during NPC differentiation, analysis of eIF3-associated mRNA by Quick-irCLIP reveals the unexpected binding of this initiation factor at the 3'UTR of most mRNA. Further analysis of alternative polyadenylation by APAseq highlights the close proximity of the eIF3-crosslinking position and the poly(A) tail. Furthermore, this interaction is not detected in Poly(A)-less transcripts. Using Riboseq, the authors then attempted to correlate eIF3 binding with the translation efficacy of mRNA, which would suggest a common mechanism of translational control in these cells. These observations indicate that eIF3-binding at the 3'UTR of mRNA, near the poly(A) tail, may participate to the closed-loop model of mRNA translation, bridging 5' and 3', and allowing ribosomes recycling. However, authors failed to detect interactions of eIF3, with either PABP or Paip1 or 40S subunit proteins, which is quite unexpected.

      Strength:

      The well-written manuscript presents an attractive concept regarding the mechanism of eIF3 function at the 3'UTR. Most mRNA in NPC seems to have eIF3 binding at the 3'UTR and only a few at the 5'end where it's commonly thought to bind. In a previous study from the Cate lab, eIF3 was reported to bind to a small region of the 3'UTR of the TCRA and TCRB mRNA, which was responsible for their specific translational stimulation, during T cell activation. Surprisingly in this study, the eIF3 association with mRNA occurs near polyadenylation signals in NPC, independently of cell differentiation status. This compelling evidence suggests a general mechanism of translation control by eIF3 in NPC. This observation brings back the old concept of mRNA circularization with new arguments, independent of PABP and eIF4G interaction. Finally, the discussion adequately describes the potential technical limitations of the present study compared to previous ones by the same group, due to the use of Quick-irCLIP as opposed to the PAR-CLIP/thiouridine.  

      Weaknesses:

      (1) These data were obtained from an unusual cell type, limiting the generalizability of the model.

      We agree that unraveling the mechanism employed by eIF3 at the mRNA 3’-UTR termini might be better studied in a stable cell line rather than in primary cells.

      (2) This study lacks a clear explanation for the increased translation associated with NPC differentiation, as eIF3 binding is observed in both differentiated and undifferentiated NPC. For example, I find a kind of inconsistency between changes in Riboseq density (Figure 3B) and changes in protein synthesis (Figure 1D). Thus, the title overstates a modest correlation between eIF3 binding and important changes in protein synthesis.

      We thank the reviewer for this question. Riboseq data and RNASeq data are not on absolute scales when comparing across cell conditions. They are normalized internally, so increases in for example RPF in Figure 3B are relative to the bulk RPF in a given condition. By contrast, the changes in protein synthesis measured in Figure 1D is closer to an absolute measure of protein synthesis. 

      (3) This is illustrated by the candidate selection that supports this demonstration. Looking at Figure 3B, ID2, and SNAT2 mRNA are not part of the High TE transcripts (in red). In contrast, the increase in mRNA abundance could explain a proportionally increased association with eIF3 as well as with ribosomes. The example of increased protein abundance of these best candidates is overall weak and uncertain.

      We agree that using TE as the criterion for defining increased eIF3 association would not be correct. By “highly translated” we only mean to convey the extent of protein synthesis, i.e. increases in ribosome protected fragments (RPF), rather than the translational efficiency.

      (4) Despite several attempts (chemical and UV cross-linking) to identify eIF3 partners in NPC such as PABP, PAIP1, or proteins from the 40S, the authors could not provide any evidence for such a mechanism consistent with the closed-loop model. Overall, this rather descriptive study lacks mechanistic insight (eIF3 binding partners).

      We agree that it will be important to identify the molecular mechanism used by eIF3 to engage the termini of mRNA 3’-UTRs. Nevertheless, the identification of eIF3 crosslinking to that location in mRNAs is new, and we think will stimulate new experiments in the field.

      (5) Finally, the authors suspect a potential impact of technical improvement provided by QuickirCLIP, that could have been addressed rather than discussed.

      We agree a side-by-side comparison of eIF3 crosslinks captured by PAR-CLIP versus QuickirCLIP will be an important experiment to do. However, NPCs or other primary cells may not be the best system for the comparison. We think using an established cell line might be more informative, to control for effects such as 4-thiouridine toxicity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The Western blot signals for SLC38A2 and ID2 are close to the membrane background and little convincing. Size markers are missing.

      We agree these antibodies are not great. They are the best we could find, unfortunately. We have included originals of all western blots and gels as supplementary information. It’s important to note that the Riboseq data for ID2 and SLC38A2 are consistent with the western blots. See Figure 3C and Figure 3–figure supplement 3B.

      (2) Figure 1 - Figure Supplement 1 appears to present data from a single experiment. This is far less than ideal considering the minor differences measured.

      Thanks for the comment. This is a representative experiment showing the early time course. We have added a second experiment with two different treatments that show the same pattern in the puromycin assay, in Figure 1–figure supplement 1.

      (3) Figure 3F: One wonders what this would look like if TE was plotted instead of RPF. Figure 3 - Figure Supplement 4 seems to show something along those lines. However, the data are not mentioned in the main results section are quite unclear. Why are data separated into TE high and low? Doesn't TE high in differentiated cells equal TE low in undifferentiated cells?

      This is an interesting question. Note that in Figure 3B, n=6300 genes show no change in TE upon differentiation, compared to a total of n=2127 that show a change in TE, with most of those changes not very large. We have now replotted Figure 3F comparing irCLIP read counts in 3’-UTRs to RPF read counts, which shows a significant positive correlation, regardless of whether we look at undifferentiated or differentiated NPCs (See Figure 3F and a new Figure 3– figure supplement 4A). We also compare irCLIP reads in 3’-UTRs to TE values, which show no correlation (See Figure 3G and Figure 3–figure supplement 4B).

      Figure 3-figure supplement 4 was actually a response to a previous round of review (at PLOS Biology) to a rather technical question from a reviewer. We think this figure and associated text should be removed. Instead, we now include supplementary tables with the processed RPF and TE values, for reference (Supplemental files 4-6). We omitted these in the original submission when they should have been included. We also abandoned comparing undifferentiated and differentiated NPCs, and instead look directly at irCLIP reads vs. RPFs or TE, regardless of NPC state, as noted above (Figure 3F, G, and Figure 3–figure supplement 4).

      (4) Figure 3C: The data should be plotted on the same y-axis scale. This would make a visual assessment of the differences in mRNA and RFP levels more intuitive.

      Thanks for this suggestion. We have rescaled the plots as requested.

      Reviewer #2 (Recommendations for the authors):

      (1) The quality of the Western blots in several figures is quite poor. Notably, Figure 1C seems to be a composite gel, as each blot appears to come from a different gel. Additionally, in Supplementary Figure 1A, there is only a single data point, yet the authors indicate that this image is representative of multiple assays. The lack of error bars in this figure raises a question vis-a-vis the reproducibility of the experiments.

      Thanks for the comments. We now include all the original gels as supplementary information. As noted above, the antibodies for ID2 and SLC38A2 are not great, we agree. And as we noted above, the Riboseq data for ID2 and SLC38A2 are consistent with the western blots.

      (2) For the top 500 targets of undifferentiated and differentiated NPCs in the Quick-irCLIP assay, the manuscript does not clarify how many targets are common and how many are unique to each condition. This information is important for understanding the extent of overlap and differentiation-specific interactions of eIF3 with mRNAs. Providing this data would strengthen the interpretation of the results.

      There are 449 of the top 500 hits in common between undifferentiated and differentiated NPCs. We have now added this information to the text, to add clarity. 

      (3) The manuscript does not provide detailed percentages or numbers regarding the overlap between iCLIP and APA-Seq peaks. Clarifying this overlap, particularly in terms of how many of the APA sites are also targets of eIF3, would bolster the understanding of how these two datasets converge to support the authors' conclusions.

      This is a difficult calculation to make, due to the fact that APA-Seq reads are generally much longer than the Quick-irCLIP reads. This is why we focused instead on quantifying the percent of Quick-irCLIP peaks (which are more narrow) overlap with predicted polyadenylation sequences, in Figure 2-figure supplement 1.

      Reviewer #3 (Recommendations for the authors):

      (1) Perform Quick-irCLIP in HEK293 cells to infer technical limitations and/or to generalize the model. The authors will then compare again eIF3 binding site in Jurkat, HEK293, and NPC.

      This is an experiment we plan to do for a future publication, given that we would want to repeat both Quick-irCLIP and PAR-CLIP at the same time.

      (2) Select mRNA candidates with high or low TE changes and analyze eIF3 binding and RPF density and protein abundance along NPC differentiation to support the role of eIF3 binding in stimulating translation.

      We agree looking at time courses in more depth would be interesting. However, this would require substantial experimentation, which is better suited to a future study. Furthermore, now that we have moved away from comparing undifferentiated NPCs and differentiated NPCs when examining TE and RPF values (Figure 3 and Figure 3–figure supplement 4), we think the results now support a more general mechanism of translation reflected in the irCLIP 3’-UTR vs. RPF correlation, independent of NPC state.

      (3) Analyze the interaction of eIF3 with eIF4G and other known partners. This will really provide an improvement to the manuscript. The lack of interaction between eIF3 and the 40S is quite surprising.

      We agree more work needs to be done on the mechanistic side. These are experiments we think would be best to carry out in a stable cell line in the future, rather than primary cells.

      (4) Perform Oligo-dT pulldown (or cap column if possible) and analyze the relative association of PABP, eIF3, and eIF4F on mRNA in NPC versus HEK293. This will clarify whether this mechanism of mRNA translation is specific to NPC or not.

      Thanks for this suggestion. We are uncertain how it would be possible to deconvolute all the possible ways to interpret results from such an experiment. We agree thinking about ways to study the mechanism will keep us occupied for a while.

      (5) Citations in the text indicate the first author, whereas the references are numbered! 

      Our apologies for this oversight. This was a carryover from previous formatting, and has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) In my opinion, the major weakness is the selection of IVs, the same IVs should be used for each exposure, especially when the outcomes (IA, SAH, and uIA) are closely related. The removal of IVs was inconsistent, for example, why was LPA rs10455872 removed for SAH but not for uIA? (significantly more IVs were used for uIA). The authors should provide more details for the justification of the removal of IVs other than only indicating "confounder" in supplementary tables. The authors should also perform additional analyses including all IVs and IVs from other PUFA GWAS.

      We apologized for our negligence. We reconducted a two-sample MR analysis following the removal of rs10455872 from the uIA, which yielded unaltered ORs and 95% confidence intervals. The P-value was once again found to be statistically insignificant. These results demonstrate the robustness of our MR analyses and indicate that this SNP does not exert an influence on the overall results. (see Figure 4)

      For SNP selection, we adhered rigorously to the established Mendelian randomization analysis process for the screening of instrumental variables. "Confounder" is mean that a current explicit influencer that is explicitly associated with the outcome variable. Following the removal of such confounding SNPs, the analysis of heterogeneity and pleiotropy is repeated on several occasions in MR analysis using radical MR, MRPRESSO, IVW-radical and Egger-radical, with each iteration involving the removal of the corresponding anomalous SNPs until all instances of pleiotropy and heterogeneity have been eliminated, it can be observed that the final single-nucleotide polymorphism (SNP) for each group is not identical. Therefore, It can be observed that the final SNPs for each group is not identical.

      (2) In addition, it seems that the SNPs in the FADS locus were driving the MR association, while FADS is a very pleiotropic locus associated with many lipid traits, removing FADS could attenuate the MR effect. The authors should perform a sensitivity analysis to remove this locus.

      Thanks for the reviewer’s suggestion. In our revised manuscript, We reconducted MR analysis of the positive results after the removal of the FADS2 and its SNPs within 500 kb of the FADS2 locus. This analysis demonstrated that there was no significant causal pathogenic association between PUFA and IA, aSAH. This result validated that SNP: rs174564 was a significant factor driving the causal association between PUFAs and CA. (See page 6, line155-157 and Figure 8)

      (3) Instead of removing multiple "confounder" IVs which I think may bias the MR results due to very closely related lipid traits, the authors should perform multivariable MR to identify independent effects of PUFAs to IA, conditioning on other PUFAs and/or other lipids.

      Thanks for the reviewer’s suggestion. In our revised manuscript, we employed MVMR through adjust for HDL cholesterol, LDL cholesterol, total cholesterol and triglycerides, to remove bias from closely related lipid traits. The application of MVMR analysis serves to reinforce the robustness of our conclusions. (See page 6, line151-153 and Figure5-7)

      (4) Colocalization was not well described, the authors should include the colocalization results for each locus in a supplementary table. They also mentioned "a large PP for H4 (PP.H4 above 0.75) strongly supports shared causal variants affecting both gene expression and phenotype". The authors should make sure that the colocalization was performed using the expression data of each gene or using the GWAS summary of each PUFA locus.

      I apologize for our negligence. We have added the detailed results of the COLOC for each locus in the supplementary table. (See supplementary table 6)

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) I suggest the authors consult Borges et al., 2022 (doi: 10.1186/s12916-022-02399-w) for PUFA IV selection, and perform sensitivity analysis based on Borges et al., 2022 IVs and another PUFA GWAS (such as J Kettunen et al., 2016, doi: 10.1038/ncomms11122).

      Thanks for the reviewer’s suggestion. In order to provide further evidence of the robustness of the results of our analyses, we conducted MVMR and a sensitivity analysis after excluding SNPs within 500 kb of the FADS2 locus, as recommended by Borges et al. (2022). (See page 6, line151-157 and Figure 5-8)

      In regard to the article by J. Kettunen et al. (2016), we found that the validation dataset from which the article was sourced was insufficient in terms of sample size and lacked the requisite statistical efficacy to be used for validation purposes.

      (2) The authors justified that colocalization is to determine if "PUFAs are mediators in the hereditary causative route of cerebral aneurysm", which I don't think is the case.

      Colocalization is to determine whether an MR estimate is not confounded by LD.

      I apologize for our incorrect description. We have made careful modification in our revised manuscript, as follows: “There is consistent evidence that PUFAs have a beneficial causal effect on cerebral aneurysm. In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7-8, line 215-217)

      (3) Supplementary tables 2-4 were a bit confusing to me, I suggest the authors provide one supplementary table for each exposure.

      Thanks for the reviewer’s suggestion. Supplementary tables 2_1-2_5 shows the exposure data for the five PUFAs associated with IA, supplementary tables 3_1-3_5 shows the exposure data for the five PUFAs associated with aSAH and supplementary tables 4_1-4_5 shows the exposure data for the five PUFAs associated with UIA. Each exposure is represented by a distinct table.

      (4) Figure 1 legend: I can't find multivariable MR in the figure/method.

      I apologize for our negligence. In our revised manuscript, we have added the MVMR methodology. We also have modified Figure 1 and Figure 1 legend. (See Figure 1, Figure 1 legend and page 6, line 151-153)

      (5) LOO analysis was mentioned in methods and results but I could not find the results for LOO.

      I apologize for our negligence. In our revised manuscript, we have described the results of the LOO, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm.” (See page 7, line 209-210)

      (6) Finally, the authors should proofread their manuscript as many sentences are difficult to read, such as:

      Line 183: "...MR methods revealed consistency", "However, there was no any causal relationship..."

      Line 200: "For achieve that..."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “The results demonstrated consistency in the outcomes and directionality of the various MR methods employed” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7, line 187-188 and line 215-217).

      Reviewer #2 (Recommendations For The Authors):

      (1) Are there any previous epidemiological studies on the association between PUFA and cerebral aneurysm? It will be helpful to introduce this background.

      Thanks for the reviewer’s suggestion. The epidemiology of PUFA with aneurysm in other sites, such as the abdominal aorta, are described in the Introduction section. Although there is a paucity of large-scale multicenter clinical epidemiological studies examining the relationship between PUFAs and cerebral aneurysms, we are endeavoring to infer a prior association between PUFAs and cerebral aneurysms with the aid of Mendelian randomization analysis.

      (2) The authors performed a leave-one-out analysis but did not explain much about the results. The leave-one-out analysis seems to provide some evidence that some SNP is driving the results, like rs174564 in Supplementary Figure 5-1.

      I apologize for our negligence. In our revised manuscript, we have described the results of the leave-one-out analysis, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm”. (See page 7, line209-214)”.

      (3) In the discussion (line 211), the authors mentioned omega-6 fatty acids increased the risk of IA and aSAH, omega-3 fatty acids decreased the risk for IA and aSAH, but omega-6 by omega-3 decreased the risk of IA and aSAH. This seems to be different from the figures.

      I apologize for our incorrect description. We have modified this description in our revised manuscript, as follows: “We demonstrated that the omega-3 fatty acids, DHA and, omega-3-pct causally decreased the risk for IA and aSAH. And omega-6 by omega-3 causally increased the risk of IA and aSAH”. (See page 8, line228-230)

      Minor:

      (4) Some grammar errors need to be checked, such as:

      In line 200, "For achieve that, we tested for shared causative SNPs between PUFAs and cerebral aneurysm using COLOC".

      In line 123, "Fourth, to eliminate unclear, palindromic and associated with known confounding factors (body mass index (McDowell et 125 al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018)) SNPs."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “Fourth, remove SNPs that are obscure, palindromic, and linked to recognized confounding variables (body mass index (McDowell et al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018))” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 5, line 124-127 and page 7 line215-217)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters. 

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: “For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).”

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions.

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H).

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text:

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024).

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included.

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: “Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571).”

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions.

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: “Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other/ Minor:

      In the beginning of the second paragraph on p. 21 of the Results section, it states that "RE-dCA1 has no effect on working memory," although it was not clear what data the authors were referring to support this conclusion.

      We refer there to the changes of freezing behavior within the CFE session. This is explained now.

      Reviewer #2 (Recommendations for the authors):

      No statement about code and data availability is present.

      The statements are added.

      ln 785: “Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors are trying to develop a microscopy system that generates data output exceeding the previous systems based on huge objectives. 

      Strengths: 

      They have accomplished building such a system, with a field of view of 1.5x1.0 cm2 and a resolution of up to 1.2 um. They have also demonstrated their system performance on samples such as organoids, brain sections, and embryos. 

      Weaknesses: 

      To be used as a volumetric imaging technique, the authors only showcase the implementation of multi-focal confocal sectioning. On the other hand, most of the real biological samples were acquired under wide-field illumination, and processed with so-called computational sectioning. Despite the claim that it improves the contrast, sometimes I felt that the images were oversharpened and the quantitative nature of these fluorescence images may be perturbed. 

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript introduced a volumetric trans-scale imaging system with an ultra-large field-of-view (FOV) that enables simultaneous observation of millions of cellular dynamics in centimeter-wide 3D tissues and embryos. In terms of technique, this paper is just a minor improvement of the authors' previous work, which is a fluorescence imaging system working at visible wavelength region (https://www.nature.com/articles/s41598-021-95930-7). 

      Strengths: 

      In this study, the authors enhanced the system's resolution and sensitivity by increasing the numerical aperture (NA) of the lens. Furthermore, they achieved volumetric imaging by integrating optical sectioning and computational sectioning. This study encompasses a broad range of biological applications, including imaging and analysis of organoids, mouse brains, and quail embryos, respectively. Overall, this method is useful and versatile. 

      Weaknesses: 

      The unique application that only can be done by this high-throughput system remains vague. Meanwhile, there are also several outstanding issues in this paper, such as the lack of technical advances, unclear method details, and nonstandardized figures. 

      Here, we address the first part of the Weaknesses concerning the unique application, and will respond to the latter part in the Reply to the Recommendations.

      We are developing 'large field of view with cellular resolution' imaging technique, aiming to apply it to the observation of multicellular systems consisting of a large number of cells. Our proposed optical system has achieved optical performance that enables simultaneous observation of more than one million cells in a single field of view. In this paper, we have succeeded in adding three-dimensional imaging capability while maintaining the size of this two-dimensional field of view. By simultaneously observing the dynamics of a large number of cells, we can reveal spatio-temporal sequences in state transitions (pattern formation, pathogenesis, embryogenesis, etc.) in multicellular systems and discover cells that serve as a starting point. These were mentioned in the 1st and 2nd paragraphs of the Introduction section (Line 48-, 58-) and discussed in the 4th paragraph of Discussion section (Line 398-) of the main text. While our previous work on two-dimensional specimens has shown its validity, the present work demonstrated that temporal changes of multicellular systems in three-dimensional specimens can be observed at the single-cell level.

      Ideally, we aim to achieve the same level of depth observation capability as the FOV size in the lateral direction. However, at present, the penetration depth for living specimens is limited to a few hundred micrometers due to non-transparency, while the lateral FOV size exceeds 1 cm. The current optical performance is well-suited for systems where development occurs within a thin volume but a large area, such as the quail embryo presented in this paper (Fig. 6 in the revised manuscript). In addition to quail embryos, this technique can also be applied to the developmental systems of highly transparent model organisms, such as zebrafish. Furthermore, for chemically cleared specimens, even those thicker than 1.5 mm, as shown in this paper (Fig. 5 in the revised manuscript), can be observed. Besides organs other than the brain, it could also be applied to imaging entire living organisms. However, for observation depths up to 10 mm, such as in the whole mouse brain, a mechanism to compensate for spherical aberration is required, which we consider the next step in our technological development.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) I suggest that authors shall re-examine the quantitative nature of their image processing algorithm. Also, I wonder whether there are parameters that could be adjusted, as images in Figure 3D and 4E seem to be oversharpened with potential loss of information. 

      As the reviewer pointed out, we recognized that there was an insufficient explanation of the image processing.

      Therefore, descriptions on the quantitative nature and parameter adjustments have been added to the text (Materials and Methods, Line 552) and the Supplementary File (Fig. S4-5, Note 2), and these have been referenced in the main text. A summary is given below.

      The adjustable parameters in our method include the cutoff frequency of the smoothing filter used in the background light estimation. If the cutoff frequency is too high, the focal plane component will be included in the “background”; if it is too low, background light will remain in the focal plane. The cutoff frequency needs to be optimized within this range. In this optimization, neither the size of the cell itself nor the performance of the optical system was considered; instead, we utilized the concept of independent component analysis (ICA). This approach is taken because the size and structure of cells vary from sample to sample, and the optical properties also vary with wavelength and location, making it impractical to consider each factor for every case. ICA employs a blind separation method, which is based on the principle that individual signals deviate from the normal (Gaussian) distribution, while the superimposition of signals tends to bring the distribution closer to the Gaussian distribution. Several indices have been proposed to quantify the non-Gaussian nature of the distribution, including kurtosis, skewness, negentropy, and mutual information. Among these measures, we empirically found skewness to be the most suitable and robust, and therefore adopted it for our algorithm. The optimal parameters were selected using a subset of the data before applying the calculations of the entire dataset. The determined values were then applied to the entire dataset.

      Regarding the oversharpening, we believe that it rarely occurs in the image data shown in the manuscript. In a case where low-frequency structures and high-frequency structures are mixed in the focal plane, oversharpeninglike effect can occur because of the disappearance of low-frequency structures, which is discussed in Supplementary File (Note 2, Figs. S5D). However, in the case of a sample with nearly uniform spatial frequency, such as the nucleus observed in this study, oversharpening is unlikely to occur by setting appropriate parameters as described above. If it appears that some images are oversharpened in the figures, it is due to the contrast of the image.

      (2) On the other hand, I am curious how a wide-field fluorescence system may reliably extract information from a denselylabeled sample within axial volume of 200 um, as they showed in the mouse brain in Figure 4. Thus I am skeptical regarding the fidelity and completeness of the signals and cells recorded there. It would be ideal if the authors could benchmark their system performance with a two-photon microscope system, which serves as the ground truth. 

      The reviewer's suggestion is reasonable; however, we are unfortunately unable to observe the same sample using a two-photon microscope. Instead, we will explain these differences from a theoretical perspective. Two-photon microscopes used for brain imaging typically employ objective lenses with a numerical aperture (NA) of at least 0.5, allowing for 3D imaging with depth resolution ranging from several micrometers down to sub-micrometer levels. In contrast, our method uses a lens system with NA of 0.25, and the optical configuration (focusing NA, pinhole size) are not optimized for resolution (Note 2 in Supplementary File), thus the longitudinal resolution (FWHM) is about 14 microns (Fig. 3E in the revised manuscript). This difference is significant in the brain imaging, where our method may not fully separate all cells in close proximity along the depth axis, as shown in the bottom panels (xz-plane) of Fig. 5F of the revised manuscript. Nevertheless, we believe that cell nuclei can be accurately detected in this 3D image using appropriate cell detection methods based on deep learning. To support this claim, we conducted cell detection using the state-of-the-art cell detection platform ELEPHANT and incorporated the results into Fig. 5 (Fig. 5G-I). This figure demonstrates that even with the current spatial resolution, accurate detection of cell nuclei is achievable.

      We accordingly added one paragraph (Line 285) in the main text to explain the cell detection method and discuss the results. We also added one section into Materials and Methods for more detail of the cell detection (Line 650).

      In conjunction with the revision, the developer of ELEPHANT (K. Sugawara) has been included as a co-author.

      Reviewer #2 (Recommendations For The Authors): 

      In my opinion, the following concerns need to be addressed. 

      Major comments: 

      (1) The proposed system's crucial element involves the development of a giant lens system with a numerical aperture (NA) of 0.25. However, a comprehensive introduction and explanation of this significant giant lens system are missing from the manuscript. I strongly suggest that the authors supplement the relevant content to provide a clearer understanding of this integral component. 

      A detailed description of the giant lens system has been added to the main text (Optical Configuration and Performance, Line 83) and the Materials and Methods section (Wide -field imaging system (AMATERAS-2w) configuration, Line 446). A diagram of the lens configuration has also been included in Fig. 1A. In conjunction with these additions, two engineers from SIGMAKOKI CO. LTD., who made significant contributions to the design and manufacturing of the lens system, have been included as co-authors.

      (2) The manuscript introduces a computational sectioning technique, based on iteratively filtering technology. However, the accuracy of this algorithm is not sufficiently validated. 

      It is challenging to discuss accuracy of the processing results compared to the ground truth, because the ground truth is unknown for any of the experiments. Instead, in the Supplementary File (Notes 2, Figures S4-5), we show how the processing results for the measured and simulated data vary with the parameter (cutoff frequency), illustrating the characteristics of our method. The results suggest that by optimally pre-selecting the parameter, it is possible to successfully separate the in-focus and out-of-focus components. This discussion is related to our response to the first recommendation made by the reviewer #1. Please review our response to Reviewer #1 regarding parameter optimization and oversharpening. Here, as an addition, we describe a discussion of the conditions that must be met in order to perform the calculation correctly, as described below (also included in Note 2, Limitation of the computational sectioning).

      To apply this method, certain requirements must be met regarding cutoff spatial frequency and intensity. Regarding cutoff spatial frequency, the algorithm utilizes a low-pass filter with a single cutoff frequency, which can make it challenging to accurately extract structures in the focal plane when structures of varying sizes and shapes are mixed within the sample. This is illustrated by the simulation shown in Fig. S5 and described in Note 2. Conversely, regarding intensity, if the structure’s intensity in the focal plane is weak compared to the Gaussian fluctuations in the background intensity, it becomes difficult to extract the structure. However, intensity fluctuations can be reduced by applying a 3x3 moving average filter to the entire image as a pre-processing step before applying the baseline estimation algorithm. 

      In the experimental data presented in this paper (Figs. 4-6 in the revised manuscript), the spatial frequency issue was not significant because the target structures, which are stained nuclei, appear to be of nearly uniform size in the focal plane. The second issue, related to intensity, is also addressed in Fig. 4, as the signal intensity from the focal plane is sufficient to overcome background light in almost all regions. In the mouse brain example, the use of confocal imaging suppresses background light, allowing the structures in the focal plane to be accurately extracted.

      (3) I didn't see a detailed description of the confocal imaging in the manuscript. If it adheres to conventional confocal technology, then the question arises: what truly constitutes the novel aspect of this technique? 

      The principle of confocal imaging and optics is based on the use of a pinhole array, a system also employed commercially by CrestOptics (X-Light, Italy). Prior to the 1990s, when the configuration utilizing Yokogawa Electric's pinhole array and microlens array pairs became popular, pinhole array-only setups were the norm, and are now considered somewhat traditional. We do not claim novelty in the optical configuration itself, but rather in the design of a confocal optical system tailored for our original large-field (low-magnification) imaging system with a relatively high NA. The pinhole array disk we designed features significantly smaller pinholes and correspondingly tighter pinhole spacing than those used for high-magnification observation purposes. We believe that this unique size and arrangement provides sufficient novelty.

      We have revised the manuscript to clearly emphasize what we believe constitutes the novelty of this technique (paragraphs starting from Line 166 and Line 183). We have also added a discussion on our confocal optical configuration and its spatial resolution in the Supplementary File (Note 1, Fig. S2-3).

      (4) Light-sheet and light-field microscopy, as two emerging 3D microscopy techniques which has theoretically higher throughput than confocal, are not sufficiently introduced in this manuscript. 

      In the previous version, we briefly mentioned light-sheet and light-field microscopy, but we recognized that more detailed explanations were necessary and should be included in the manuscript. We have added several sentences to the main text (Line 159-165). A summary is provided below. 

      Light-sheet microscopy requires the illumination light to propagate over long distances within the specimen, and many applications necessitate the use of transparency-enhanced tissue. Even when the sample is highly transparent, no existing technique can form thin optical sections as long as 1 cm. Therefore, light-sheet microscopy is not an effective method for the thin, wide, three-dimensional objects that are the focus of this project. Regarding light-field microscopy, it features a trade-off where the number of pixels in the two-dimensional plane is reduced in exchange for the ability to record three-dimensional fluorescence distribution information in a single shot. In our imaging system, the pixel spacing is set to be comparable to the Nyquist Frequency to observe as many cells as possible, meaning that no more additional pixels can be sacrificed. Therefore, the light-field microscopy technique is not suitable for our imaging system.

      (5) The fluorescence images of cardiomyocytes derived from human induced pluripotent stem cells (hiPSCs) stained with Rhodamine phalloidin, as presented in Figure 1(E), exhibit suboptimal quality. This may hinder the effective use of the image for biological research. It is imperative that the authors address and explain this aspect, shedding light on the limitations and potential implications of the research findings. 

      We acknowledge the reviewer’s concern regarding the suboptimal quality of the fluorescence image. Upon further examination, we recognized that the resolution and clarity of the image could potentially limit its utility for detailed biological analysis. To address this, we have re-examined the image size and quality to enhance its presentation in Fig. 2C-E in the revised manuscript, which allows for finer structures to be recognized within the large image size.

      Regarding the effective use of the image for biological research, the results shown in the images indicated the capability of observing subcellular structures, such as myofibrils, in cell sheets with a large area, such as myocardial sheets. This would enable us to simultaneously investigate micro-level structures (orientation and density of myofibrils) and macro-level multicellular dynamics (performance of myocardial sheet). We added the above explanation in the manuscript (Line 146). We hope this revision clarifies the quality and utility of the presented image.

      (6) The imaging quality difference between the two techniques shown in Figure 1F, G is relatively small, and the signal distribution difference shown in Figure H is significant, unlike the effects expected from an improvement in resolution. 

      We acknowledge the reviewer's concern regarding the minimal apparent difference in imaging quality between the two images. Upon re-evaluation, we recognized that the original presentation may not have clearly demonstrated the improvements intended by the different techniques. Figure 1H, which showed the line profile of Figs. 1F and G, may have been impacted by the resolution and compression settings of the image file, leading to a less pronounced distinction between the two techniques. To address this, we have enlarged Figs 1F and 1G

      (renumbered as Fig. 2D and 2E in the revised manuscript) and carefully reviewed the resolution and compression ratio to ensure that the differences are more clearly visible. 

      (7) The chart in Figure 2(C) lacks axis titles and numerical labels, making it challenging for readers to comprehend. To enhance reader convenience, it is recommended that the authors incorporate axis titles and numerical labels, providing a clearer context for interpreting the chart. 

      We appreciate the reviewer’s observation regarding the lack of axis titles and numerical labels in the figure. The vertical axis represents fluorescence intensity, which we initially omitted, assuming it was self-evident. However, as the reviewer correctly pointed out, it is crucial to ensure that figures are clear and accessible to readers from diverse backgrounds. In response, we have added the vertical axis title to Fig. 2C (renumbered as Fig. 3C in the revised manuscript) to enhance clarity, while the numerical labels remain omitted as the unit is arbitrary (a.u.). We have also reviewed all other figures in the manuscript to ensure that no similar errors are present.

      (8) In Figures 2(D) and (E), where the authors present the point spread function for quantifying the lateral and axial resolution of the system, I would recommend increasing the number of fluorescent microspheres to more than 10 for statistical averaging. This adjustment would strengthen the persuasiveness of the data and contribute to a more robust analysis. 

      We appreciate the reviewer’s recommendation to increase the number of fluorescent microspheres for statistical averaging in Figs. 2D and E (renumbered as Fig. 3D-E in the revised manuscript). In response, we have revised the graphs to present the point spread function with the statistical mean and standard deviation (SD) of fluorescent images obtained from a large sample size (N = 100), and accordingly revised the main text to mention the statistics (Line 118, Line 132). We also recognized that a similar adjustment was necessary for Figs 1C and D (renumbered as Fig. 2A-B in the revised manuscript), and have accordingly made the same modifications to those figures as well. We believe these changes enhance the robustness and persuasiveness of our data.

      (9) Figure 4(C) visually represents the characteristic 3D structures of several regions. However, discerning the 3D structural information in the images poses a challenge. To address this issue, I recommend that the authors optimize the 3D visualization to improve clarity and facilitate a more effective interpretation of the depicted structures. 

      We appreciate the reviewer’s suggestion regarding the challenges in discerning the 3D structural information in Fig. 4C. To address this, we have added representative images from the xy-plane and xz-plane of the cortex, medial habenula, and choroid plexus (Fig. 5G-I) in the revised manuscript. These additions provide a clearer visualization of the 3D distribution in each region, making it easier for readers to interpret the structures. Additionally, we have overlaid the results of deep-learning based cell detection on these images, further enhancing the visibility of the cells. This adjustment also aligns with our response to Reviewer #1's second comment.

      Minor comments: 

      (1) The labelling of ROI is missing in Figure 1(e). 

      We appreciate the reviewer’s observation regarding the missing labeling of the ROI in Fig. 1E. Upon review, we confirmed that the ROI was indeed labeled with a white square in the previous manuscript; however, it was difficult to discern due to its small size and the black-and-white contrast. To improve visibility, we have recolored the square in magenta, ensuring that it stands out more clearly in the figure (Fig. 2C in the revised manuscript).

      (2) The subfigure order and labeling in Fig. 1 and Fig. 2 are not consistent.

      We appreciate the reviewer’s attention to the subfigure order and labeling in Fig. 1 and 2 (Fig. 1-3 in the revised manuscript). To accommodate subfigures of varying sizes without leaving gaps, we arranged the subfigures in a non-sequential order. However, we have ensured that the text refers to the figures in the correct order. We acknowledge the importance of consistency and will work with the editorial team to explore the best way to present the figures while maintaining clarity and alignment with standard practices.

      (3) Figure 1B reappears in Figure 2.  

      We appreciate the reviewer’s observation regarding the repetition of Figure 1B in Figure 2. While the central part of the optical system (custom lens system) is common to both figures, the illumination system, pinhole array disk, and detection optics for the confocal set up differ. To provide a complete understanding of the optical system, we opted to include the full diagram in Fig. 2B (renumbered as Fig. 3B in the revised manuscript). We considered highlighting only the different components, but we felt that doing so might complicate the reader’s comprehension of the overall system. Therefore, we chose to include the common elements twice to ensure clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      We are grateful to the reviewer’s thoughtful comments.

      Weaknesses:

      I only have one potential worry. The analysis for gait tracking (1 Hz) in Experiment 2 (Figures 3a/b) starts by computing a congruency effect (A/V stimulation congruent (same frequency) versus A/V incongruent (V at 1 Hz, A at either 0.6 or 1.4 Hz), separately for the Upright and Inverted conditions. Then, this congruency effect is contrasted between Upright and Inverted, in essence computing an interaction score (Congruent/Incongruent X Upright/Inverted). Then, the channels in which this interaction score is significant (by cluster-based permutation test; Figure 3a) are subselected for further analysis. This further analysis is shown in Figure 3b and described in lines 195-202. Critically, the further analysis exactly mirrors the selection criteria, i.e. it is aimed at testing the effect of Congruent/Incongruent and Upright/Inverted. This is colloquially known as "double dipping", the same contrast is used for selection (of channels, in this case) as for later statistical testing. This should be avoided, since in this case even random noise might result in a significant effect. To strengthen the evidence, either the authors could use a selection contrast that is orthogonal to the subsequent statistical test, or they could skip either the preselection step or the subsequent test. (It could be argued that the test in Figure 3b and related text is not needed to make the point - that same point is already made by the cluster-based permutation test.)

      Thanks for the helpful suggestions. In Experiment 2, to investigate whether the multisensory integration effect was specialized for biological motion perception, we contrasted the congruency effect between the upright and inverted conditions to search for clusters showing a significant interaction effect. We performed further analyses based on neural responses from this cluster to examine whether the congruency effect was significant in the upright and the inverted conditions, respectively, following the logic of post hoc comparisons after identifying an interaction effect. However, we agree with the reviewer that comparing the congruency effects between the upright and inverted conditions again based on data from this cluster was redundant and resulted in doubledipping. Therefore, we have removed this comparison from the main text and optimized the way to present our results in the revised Fig. 3).

      Related to the above: the test for the three-way interaction (lines 211-216) is reported as "marginally significant", with a p-value of 0.087. This is not very strong evidence.

      As shown in Fig.3b & e, the magnitude of amplitude differs between the gaitcycle frequency (mean = 0.008, SD = 0.038) and the step-cycle frequency (mean = 0.052; SD =0.056), which might influence the statistical results of the interaction effect. To reduce such influence, we converted the amplitude data at each frequency condition into Z-scores, separately. The repeated-measures ANOVA analysis on these normalized amplitude data revealed a significant three-way interaction (F (1,23) = 7.501, p = 0.012, ƞ<sub>p</sub><sup>2</sup> \= 0.246). We have updated the results in the revised manuscript (lines 218-225).

      Reviewer #1 (Recommendations For The Authors):

      -  Which variable caused one data point to be classified as outlier? (line 221).

      The outlier is a participant whose audiovisual congruency effect (Upright – Inverted) in neural responses at the frequency of interest exceeds 3 SD from the group mean. It is marked by a red diamond in Author response 2. Before removing the data, the correlation between the AQ score and the congruency effect is r \= -0.396, p \= 0.055. For comparison, the results after removing the outlier are shown in Fig. 3c of the revised manuscript. We have added more information about the variable causing the outlier in the revised manuscript (lines 231-232).

      Author response image 1.

      The correlation between AQ score and congruency effect

      -  The authors cite Maris & Oostenveld (2007) in line 415 as the main reference for the FieldTrip toolbox, but the correct reference here is different, see https://www.fieldtriptoolbox.org/faq/how_should_i_refer_to_fieldtrip_in_my_p ublication/

      Thank you for pointing out this issue. Citation corrected.

      -  The authors could consider giving some more background on the additive vs superadditive distinction in the Introduction, which may increase the impact; as it stands the reader might not know why this is particularly interesting. Summarize some of the takeaways of the Stevenson et al. (2014) review in this respect.

      Thanks for the suggestion and we have added the following relevant information in the Introduction (lines 80-90):

      “Moreover, we adopted an additive model to classify multisensory integration based on the AV vs A+V comparison. This model assumes independence between inputs from each sensory modality and distinguishes among sub-additive (AV < A+V), additive (AV = A+V), and super-additive (AV > A+V) response modes (see a review by Stevenson et al., 2014). The additive mode represents a linear combination between two modalities. In contrast, the super-additive and subadditive modes indicate non-linear interaction processing, either with potentiated neural activation to facilitate the perception or detection of nearthreshold signals (super-additive) or a deactivation mechanism to minimize the processing of redundant information cross-modally (sub-additive) (Laurienti et al., 2005; Metzger et al., 2020; Stanford et al., 2005; Wright et al., 2003).”

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      We thank the reviewer for the valuable feedback.

      Weaknesses:

      The manuscript interprets the neural findings using mechanistic and cognitive claims that are not justified by the presented analyses and results.

      First, entrainment and cortical tracking are both invoked in this manuscript, sometimes interchangeably so, but it is becoming the standard of the field to recognize their separate evidential requirements. Namely, step and gate cycles are striking perceptual or cognitive events that are expected to produce event-related potentials (ERPs). The regular presentation of these events in the paradigm will naturally evoke a series of ERPs that leave a trace in the power spectrum at stimulation rates even if no oscillations are at play. Thus, the findings should not be interpreted from an entrainment framework except if it is contextualized as speculation, or if additional analyses or experiments are carried out to support the assumption that oscillations are present. Even if oscillations are shown to be present, it is then a further question whether the oscillations are causally relevant toward the integration of biological motion and for the orchestration of cognitive processes.

      Second, if only a cortical tracking account is adopted, it is not clear why the demonstration of supra-additivity in spectral amplitude is cognitively or behaviorally relevant. Namely, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated with the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion, and it is possible that such neural correlates relate to behaviorally relevant neural mechanisms, but based on the current task and associated analyses this has not been shown.

      Thanks for raising the important concerns regarding the interpretation of our results within the entrainment or the cortical tracking frame. A strict neural entrainment account emphasizes the alignment of endogenous neural oscillations with external rhythms, rather than a mere regular repetition of stimulus-evoked responses. However, it is challenging to fully dissociate these components, given that rhythmic stimulation can shape intrinsic neural oscillations, resulting in an intricate interplay between endogenous neural oscillations and stimulus-evoked responses (Duecker et al., 2024; Herrmann et al., 2016; Hosseinian et al., 2021). Therefore, some research, including the current study, use the term “entrainment” to refer to the alignment of brain activity to rhythmic stimulation in a broader context, without isolating the intrinsic oscillations and evoked responses (e.g., Ding et al., 2016; Nozaradan et al., 2012; Obleser & Kayser, 2019). Nevertheless, we agree with the reviewer that since the current results did not examine or provide direct evidence for endogenous oscillations, it is better to contextualize the oscillation view as speculations. Hence, we have replaced most of the expressions about “entrainment” with a more general term “tracking” in the revised manuscript (as well as in the title of the manuscript). We only briefly mentioned the entrainment account in the Discussion to facilitate comparison with the literature (lines 307-312).

      Regarding the relevance between neural findings and cognition or behavioral performance, the first supporting evidence comes from the inversion effect in Experiment 2. For the neural responses at gait-cycle frequency, we observed a significantly enhanced audiovisual congruency effect in the upright condition compared with the inverted condition. Inversion disrupts the distinctive kinematic features of biological motion (e.g., gravity-compatible ballistic movements) and significantly impairs biological motion processing, but it does not change the basic visual properties of the stimuli, including the rhythmic signals generated by low-level motion cues. Therefore, the inversion effect has long been regarded as an indicator of the specificity of biological motion processing in numerous behavioral and neuroimaging studies (Bardi et al., 2014; Grossman & Blake, 2001; Shen, Lu, Yuan, et al., 2023; Simion et al., 2008; Troje & Westhoff, 2006; Vallortigara & Regolin, 2006; Wang et al., 2014; Wang & Jiang, 2012; Wang et al., 2022). Here, our finding of the cortical tracking of higher-order rhythmic structures (gait cycles) present in the upright but not in the inverted condition suggests that this cortical tracking effect can not be explained by ERPs evoked by regular onsets of rhythmic events. Rather, it is closely linked with the specialized cognitive processing of biological motion. Furthermore, we found that the BM-specific cortical tracking effect at gait-cycle frequency (rather than the non-selective tracking effect at step-cycle frequency) correlates with observers’ autistic traits, indicating its functional relevance to social cognition. These findings convergingly suggest that the cortical tracking effect that we currently observed engages cognitively relevant neural mechanisms. In addition, our recent behavioral study showed that listening to frequency-congruent footstep sounds, compared with incongruent sounds, enhanced the visual search for human walkers but not for non-biological motion stimuli containing the same rhythmic signals (Shen, Lu, Wang, et al., 2023). These results suggest that audiovisual correspondence specifically enhances the perceptual and attentional processing of biological motion. Future research could examine whether the cortical tracking of rhythmic structures plays a functional role in this process, which may shed more light on the behavioral relevance of the cortical tracking effect to biological motion perception. We have incorporated the above information into the Discussion (lines 268-293).

      Reviewer #2 (Recommendations For The Authors):

      In Figure 1c, it could be helpful to add the word "static" in the illustration for the auditory condition so that readers understand without reading the subtext that it is a static image without biological motion.

      Suggestion taken.

      In the Discussion, I believe it is important to justify an oscillation and entrainment account, or if it cannot be justified based on the current results and analyses (which is my opinion), it could be helpful to explicitly frame it as speculation.

      We agree with the reviewer. For more clarification, please refer to our response to the public review.

      L335, I did not understand this sentence - a reformulation would be helpful.

      The point-light stimuli were created by capturing the motion of a walking actor (Vanrie & Verfaillie, 2004). The global motion of the walking sequences was eliminated so that the point-light walker looks like walking on a treadmill without translational motion. We have reformulated the sentence as follows: “The point-light walker was presented at the center of the screen without translational motion.”

      The results in Figure 2a and 2d are derived by performing a t-test between the amplitude at the frequency of gait and step cycles and zero. Comparison against amplitude of zero is too liberal; the possibility for a Type-I error is inflated because even EEG data with only noise will not have amplitudes of zero at all frequencies. A better baseline (H0) is either the 1/frequency trend in the power spectrum derived using methods like FOOOF (https://fooof-tools.github.io/fooof/) or by performing non-parametric shuffling based methods (https://doi.org/10.1016/j.jneumeth.2007.03.024).

      In our data analysis, instead of performing the t-test between raw amplitude with zero, we compared the normalized amplitude at each frequency bin (by subtracting the average amplitude measured at the neighboring frequency bins from the original amplitude data) against zero. Such analysis is equal to contrasting the raw amplitude to its neighboring frequency bins, allowing us to test whether the neural response in each frequency bin showed a significant enhancement compared with its neighbors. The multiple comparisons on each frequency bin were controlled by false discovery rate (FDR) correction, reducing the Type-I error. Such analysis procedures help reduce (though not totally remove) the influence of the 1/f trend and have been widely used in this field (Cirelli et al., 2016; Henry & Obleser, 2012; Lenc et al., 2018; Nozaradan et al., 2012; Peter et al., 2023).

      To further verify our findings, we adopted the reviewer’s suggestion and created a baseline by performing a non-parametric shuffling-based analysis. More specifically, to establish the statistical significance of amplitude peaks, we carried out a surrogate analysis on each condition. For each participant, a single control surrogate dataset was derived from their actual dataset by jittering the onset of each step-cycle relative to the actual original onset by a randomly selected integer value ranging between − 490–490 ms. This procedure removed the consistent relationship between the EEG signal and the stimuli while preserving each epoch’s general timing within the exposure period. Then, epochs were extracted based on surrogate stimuli onset, and amplitude was computed across frequencies through FFT under a null model of non-entrainment (Moreau et al., 2022). This entire procedure was performed 100 times, producing a surrogate amplitude distribution of 100 group-averaged values for each condition. If the observed amplitude values at the frequency of interest exceeded the value corresponding to the 95th percentile of the surrogate distribution (p < .05) within a given condition (e.g., AV), the amplitude peak was considered significant (Batterink, 2020). As shown in Author response image 2, the statistical results from these analyses are similar to those reported in the manuscript, confirming the significant amplitude peaks at the frequencies of interest.

      Author response image 2.

      Non-parametric analysis for spectral peak. The dotted lines represent the random data based on shuffling analysis. The solid lines represent the observed data in measured EEG signals. All conditions induced significant peaks at step-cycle frequency and its harmonic, while only the AV condition induced a significant peak at gait-cycle frequency.

      Reviewer #3 (Public Review):

      Strengths:

      The main strengths of the paper relate to the conceptualization of BM and the way it is operationalized in the experimental design and analyses. The use of entrainment, and the tracking of different, nested aspects of BM result in seemingly clean data that demonstrate the basic pattern. The first experiments essentially provide the basic utility of the methodological innovation and the second experiment further hones in on the relevant interpretation of the findings by the inclusion of better control stimuli sets.

      Another strength of the work is that it includes at a conceptual level two replications.

      We appreciate the reviewer for the comprehensive review and positive comments.

      Weaknesses:

      The statistical analysis is misleading and inadequate at times. The inclusion of the autism trait is not foreshadowed and adequately motivated and is likely underpowered. Finally, a broader discussion over other nested frequencies that might reside in the point-light walker stimuli would also be important to fully interpret the different peaks in the spectra.

      (1) Regarding the nested frequency peaks in the spectra, we did observe multiple significant amplitude peaks at 1f (1/0.83 Hz), 2f (2/1.67 Hz), and 4f (4/3.33 Hz) relative to the gait-cycle frequency (Fig. 2 a&d). To further test the functional roles of the neural activity at different frequencies, we analyzed the audiovisual integration modes at each frequency. Note that we collapsed the data from Experiments 1a & 1b in the analysis as they yielded similar results. Overall, results show a similar additive audiovisual integration mode at 2f and 4f and a super-additive integration mode only at 1f (Figure S1), suggesting that the cortical tracking effects at 2f and 4f may be functionally linked but independent of that at 1f. We have reported the detailed results in the Supplementary Information.

      (2) For the reviewer’s other concerns about statistical analysis and autism traits, please refer to our responses below to the Recommendations for the authors.

      Reviewer #3 (Recommendations For The Authors):

      The description of the analyses performed for experiment 2 comes across as double dipping. Congruency effects for BM and non-BM motion (inverted) were compared using cluster-based statistics. Then identified clusters informed an averaging of signals which then were subjected to a paired comparison. At this point, it is no surprise that these paired comparisons are highly significant seeing that the channels were selected based on a cluster analysis of the same exact contrast. This approach should be avoided.

      In the analysis of the repeated measures ANOVA reporting a trend as marginally significant is misleading. Reporting the statistical results whilst indicating that those do not reach significance is the appropriate way to communicate this finding. Other statistics can be used in order to provide the likelihood of those findings supporting H1 or H0 if the authors would like to state something more precise (Bayesian).

      Thanks for the comments. We have addressed these two points in our response to the public review of Reviewer #1.

      The authors perform a correlation along "autistic trait" scores in an individual differences approach. Individual differences are typically investigated in larger samples (>n=40). In addition, the range of AQ scores seems limited to mostly average or lower-than-average AQs (barring a couple). These points make the conclusions on the possible role of BM in the autistic phenotype very tentative. I would recommend acknowledging this.

      An alternative analysis approach that might better suit the smaller sample size is a comparison between high and low AQ participants, defined based on a median split.

      Many thanks for the suggestion. We agree with the reviewer that the sample size (n = 24) in the current study is not large for exploring the correlation between BM and autistic traits. The narrow range of AQ scores was due to the fact that all participants were non-clinical populations and we did not pre-select participants by AQ scores. To further confirm our findings, we adopted your suggestion to compare the BM-specific cortical tracking effect (i.e., audiovisual congruency effect (Upright - Inverted)) between high and low AQ participants split by the median AQ score (20) of this sample. Similar to correlation analysis, one outlier, whose audiovisual congruency effect (Upright – Inverted) in neural responses at 1 Hz exceeds 3 SD from the group mean, was removed from the following analysis. As shown in Figure S3, at 1 Hz, participants with low AQ showed a greater cortical tracking effect compared with high AQ participants (t (21) = 2.127, p \= 0.045). At 2 Hz, low and high AQ participants showed comparable neural responses (t (22) = 0.946, p \= 0.354). These results are in line with the correlation analysis, providing further support to the functional relevance between social cognition and cortical tracking of biological motion as well as its dissociation at the two temporal scales. We have added these results to the main text (lines 238-244) and the supplementary information.

      Writing

      The narrative could be better unfolded and studies better motivated. The transition from basic science research on BM to possibly delineating a mechanistic understanding of autism was a surprise at the end of the intro. Once the authors consider the suggestions and comments above it would be good to have this detail and motivation more obviously foreshadowed in the text.

      Thanks for the great suggestion and we have provided an introduction about how audiovisual BM processing links with social cognition and ASD in the first paragraph of the revised manuscript (lines 46-56). In particular, integrating multisensory BM cues is foundational for perceiving and attending to other people and developing further social interaction. However, such ability is usually compromised in people with social deficits, such as individuals with autism spectrum disorder (ASD) (Feldman et al., 2018), and even in non-clinical populations with high autistic traits (Ujiie et al., 2015). These behavioral findings underline the close relationship between multisensory BM processing and one’s social cognitive capability, motivating us to further explore this issue at the neural level in the current study. We have also modified the relevant content in the last paragraph of the Introduction (lines 100-108), briefly mentioning the methods that we used to investigate this issue.

      The use of terminology related to neural oscillations which are entraining to the BM seems to suggest that the rhythmic tracking inevitably stems from the shaping of existing intrinsic dynamics of the brain. I am not sure this is necessarily the case. I would therefore adopt a more concrete jargon for the description of the entrainment seen in this study. If a discussion over internal dynamics shaped by external stimuli should be invoked, it should be done explicitly with appropriate references (but in my opinion, it isn't quite required).

      Please refer to our response to a similar point raised in the public review of Reviewer #2.

      References

      Bardi, L., Regolin, L., & Simion, F. (2014). The First Time Ever I Saw Your Feet: Inversion Effect in Newborns’ Sensitivity to Biological Motion. Developmental Psychology, 50. https://doi.org/10.1037/a0034678

      Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/highfunctioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5–17. https://doi.org/10.1023/a:1005653411471

      Batterink, L. (2020). Syllables in Sync Form a Link: Neural Phase-locking Reflects Word Knowledge during Language Learning. Journal of Cognitive Neuroscience, 32(9), 1735–1748. https://doi.org/10.1162/jocn_a_01581

      Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience, 10. https://doi.org/10.3389/fnins.2016.00229

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B. J., Sivarao, D. V., & Zoefel, B. (2024). Challenges and approaches in the study of neural entrainment. Journal of Neuroscience, 44(40). https://doi.org/10.1523/JNEUROSCI.1234-24.2024

      Falck-Ytter, T., Nyström, P., Gredebäck, G., Gliga, T., Bölte, S., & the EASE team. (2018). Reduced orienting to audiovisual synchrony in infancy predicts autism diagnosis at 3 years of age. Journal of Child Psychology and Psychiatry, 59(8), 872–880. https://doi.org/10.1111/jcpp.12863

      Feldman, J. I., Dunham, K., Cassidy, M., Wallace, M. T., Liu, Y., & Woynaroski, T. G. (2018). Audiovisual multisensory integration in individuals with autism spectrum disorder: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews, 95, 220–234. https://doi.org/10.1016/j.neubiorev.2018.09.020

      Grossman, E. D., & Blake, R. (2001). Brain activity evoked by inverted and imagined biological motion. Vision Research, 41(10), 1475–1482. https://doi.org/10.1016/S0042-6989(00)00317-5

      Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences, 109(49), 20095–20100. https://doi.org/10.1073/pnas.1213390109

      Herrmann, C. S., Murray, M. M., Ionta, S., Hutt, A., & Lefebvre, J. (2016). Shaping Intrinsic Neural Oscillations with Periodic Stimulation. Journal of Neuroscience, 36(19), 5328–5337. https://doi.org/10.1523/JNEUROSCI.0236-16.2016

      Hosseinian, T., Yavari, F., Biagi, M. C., Kuo, M.-F., Ruffini, G., Nitsche, M. A., & Jamil, A. (2021). External induction and stabilization of brain oscillations in the human. Brain Stimulation, 14(3), 579–587. https://doi.org/10.1016/j.brs.2021.03.011

      Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. https://doi.org/10.1038/nature07868

      Laurienti, P. J., Perrault, T. J., Stanford, T. R., Wallace, M. T., & Stein, B. E. (2005). On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166(3), 289–297. https://doi.org/10.1007/s00221-005-2370-2

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Metzger, B. A., Magnotti, J. F., Wang, Z., Nesbitt, E., Karas, P. J., Yoshor, D., & Beauchamp, M. S. (2020). Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 40(36), 6938–6948. https://doi.org/10.1523/JNEUROSCI.0279-20.2020

      Moreau, C. N., Joanisse, M. F., Mulgrew, J., & Batterink, L. J. (2022). No statistical learning advantage in children over adults: Evidence from behaviour and neural entrainment. Developmental Cognitive Neuroscience, 57, 101154. https://doi.org/10.1016/j.dcn.2022.101154

      Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective Neuronal Entrainment to the Beat and Meter Embedded in a Musical Rhythm. Journal of Neuroscience, 32(49), 17572–17581. https://doi.org/10.1523/JNEUROSCI.3203-12.2012

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Peter, V., Goswami, U., Burnham, D., & Kalashnikova, M. (2023). Impaired neural entrainment to low frequency amplitude modulations in English-speaking children with dyslexia or dyslexia and DLD. Brain and Language, 236, 105217. https://doi.org/10.1016/j.bandl.2022.105217

      Shen, L., Lu, X., Wang, Y., & Jiang, Y. (2023). Audiovisual correspondence facilitates the visual search for biological motion. Psychonomic Bulletin & Review, 30(6), 2272–2281. https://doi.org/10.3758/s13423-023-02308-z

      Shen, L., Lu, X., Yuan, X., Hu, R., Wang, Y., & Jiang, Y. (2023). Cortical encoding of rhythmic kinematic structures in biological motion. NeuroImage, 268, 119893. https://doi.org/10.1016/j.neuroimage.2023.119893

      Simion, F., Regolin, L., & Bulf, H. (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. https://doi.org/10.1073/pnas.0707021105

      Stanford, T. R., Quessy, S., & Stein, B. E. (2005). Evaluating the Operations Underlying Multisensory Integration in the Cat Superior Colliculus. Journal of Neuroscience, 25(28), 6499–6508. https://doi.org/10.1523/JNEUROSCI.5095-04.2005

      Stevenson, R. A., Ghose, D., Fister, J. K., Sarko, D. K., Altieri, N. A., Nidiffer, A. R., Kurela, L. R., Siemann, J. K., James, T. W., & Wallace, M. T. (2014). Identifying and Quantifying Multisensory Integration: A Tutorial Review. Brain Topography, 27(6), 707–730. https://doi.org/10.1007/s10548-014-0365-7

      Troje, N. F., & Westhoff, C. (2006). The Inversion Effect in Biological Motion Perception: Evidence for a “Life Detector”? Current Biology, 16(8), 821–824. https://doi.org/10.1016/j.cub.2006.03.022

      Ujiie, Y., Asai, T., & Wakabayashi, A. (2015). The relationship between level of autistic traits and local bias in the context of the McGurk effect. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.00891

      Vallortigara, G., & Regolin, L. (2006). Gravity bias in the interpretation of biological motion by inexperienced chicks. Current Biology, 16(8), R279–R280. https://doi.org/10.1016/j.cub.2006.03.052

      Vanrie, J., & Verfaillie, K. (2004). Perception of biological motion: A stimulus set of human point-light actions. Behavior Research Methods, Instruments, & Computers, 36(4), 625–629. https://doi.org/10.3758/BF03206542

      Wang, L., & Jiang, Y. (2012). Life motion signals lengthen perceived temporal duration. Proceedings of the National Academy of Sciences of the United States of America, 109(11), E673-677. https://doi.org/10.1073/pnas.1115515109

      Wang, L., Yang, X., Shi, J., & Jiang, Y. (2014). The feet have it: Local biological motion cues trigger reflexive attentional orienting in the brain. NeuroImage, 84, 217–224. https://doi.org/10.1016/j.neuroimage.2013.08.041

      Wang, Y., Zhang, X., Wang, C., Huang, W., Xu, Q., Liu, D., Zhou, W., Chen, S., & Jiang, Y. (2022). Modulation of biological motion perception in humans by gravity. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-30347-y

      Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech. Cerebral Cortex, 13(10), 1034–1043. https://doi.org/10.1093/cercor/13.10.1034

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).

      In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.

      The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.

      While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.

      The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.

      I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).

      We thank the reviewer for the evaluation and address the detailed points below.

      Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Strengths:

      (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.

      (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.

      Weaknesses:

      It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.

      We are grateful to the reviewer for the critique. Indeed, the degree to which clustered inputs belonging to a functional neuronal assembly occur on interneuron dendrites is an open question. However, Chen et al (2013, Nature 499:295-300) reported that dendritic domains of PV-positive interneurons in visual cortex, unlike their somata, exhibit calcium transients in vivo which are highly tuned to stimulus orientation. This suggests that clustered inputs to dendritic segments may well belong to functional assemblies, much as in principal cells (e.g. Wilson et al, 2016, Nature Neuroscience 19:1003–1009; Iacaruso et al, 2017, Nature 547;449–452). In our earlier work reporting NMDAR-dependent supralinear summation of glutamate uncaging-evoked responses at a subset of dendrites on PV-positive interneurons, we demonstrated how this arrangement in an oscillating feedback circuit could be exploited to stabilise neuronal assemblies.

      Reviewer #3 (Public review):

      Summary:

      The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.

      Weaknesses:

      (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.

      We thank the reviewer for the helpful comments.

      We did not actually claim that EPSP amplitude has no role in determining the magnitude of non-linearity: “Among possible sources of variability for voltage supralinearity, we did not observe a systematic dependence on the average amplitude of individual uEPSPs […] (Fig. S2)”. Whilst we fully agree that, at first sight, a positive dependence of supralinearity on uEPSP amplitude might be expected simply from the voltage-dependent kinetics of NMDARs, there are two main reasons why this could have been obscured. First, the expected relationship is non-monotonic, because with large local depolarizations the driving force collapses, as seen in the overall sigmoid shape of the average relationship between the scaled observed response and arithmetic sum (e.g. Figs 2a & c; 4c & e). Therefore, we would arguably expect a parabolic relationship rather than a simple positive slope relating the degree of supralinearity to the average amplitude of individual uEPSPs. Second, given that the uncaging distance varied substantially, the average amplitudes of the individual uEPSPs recorded at the soma would have undergone different degrees of electrotonic attenuation and further distortion by active conductances before they were measured. Ultimately, the plots in Fig. S2 show too much scatter to be able to exclude a positive or parabolic relationship of nonlinearity to uEPSP amplitude. To avoid misunderstanding, we have changed the sentence in the Results that refers to Fig. S2 to: “Among possible sources of variability for voltage supralinearity, we did not observe a significant monotonic dependence on the average amplitude of individual uEPSPs, distance from the uncaging location along the dendrite to the soma, [or] the dendrite order (Fig. S2)”.

      As for the relative contributions of NMDARs and AMPARs, voltage clamp recordings from both neurogliaform and OLM interneurons have already been reported, with the conclusion that neurogliaform cells exhibit relatively larger NMDAR-mediated currents (e.g. Chittajallu et al. 2017; Booker et al. 2021; Mercier et al. 2022), entirely in keeping with the conclusions of our study. Repeating these measurements would add little to the study. Furthermore, because the mean baseline uEPSP amplitude was <0.5 mV (Fig S2), it would be difficult to obtain reliable meaurements of isolated NMDAR-mediated uEPSCs.

      Turning to the high variability of supralinearity, indeed, the 95% confidence interval for the data in Fig. 2d is 73%, 213%. This degree of variability is consistent with the wide range of NMDAR/AMPAR ratios reported by Chittajallu et al. 2017 (their Fig. 1g), compounded by the expected non-monotonic relationship alluded to above.

      (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.

      We have plotted the degree of nonlinearity against dendritic diameter for neurogliaform cells (under baseline conditions and in D-AP5) in Fig S2h-k. We did not observe any significant linear correlations, other than between amplitude nonlinearity and dendrite diameter post D-AP5. This does not negate the possibility that the significant difference in average dendritic diameters between neurogliaform and OLM cells contributes to differences in impedance (which we have rephrased as “Among possible explanations is that the local dendritic impedance is greater in neurogliaform cells, lowering the threshold for recruitment of regenerative currents”).

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      The difference in average dendritic diameters between OLM and neurogliaform cells is highly significant (Fig. 8q, P<0.001). We do not claim that dendritic diameter (and by implication local impedance) is the only determinant of the degree of non-linearity. The suggestion that a gradient of glutamate receptor density contributes is interesting. However, the results of uncaging experiments targeting more distal OLM dendrites of similar diameter as neurogliaform dendrites would be subject to numerous confounds, not least the very different electrotonic attenuation, likely differences in various active conductances, and the presence of spines in OLM dendrites (which are generally sparse and were not reliably imaged in our experiments). Moreover, the cell would have to remain patched for longer in order for the fluorescent dyes to invade the distal dendrites. This alone could potentially result in systematic biases among groups. We now cite Guirrado et al (2014) and Gramuntell et al (2021) to highlight that factors other than dendritic diameter per se, such as inhomogeneity in spine and NMDA receptor density may also contribute to the heterogeneity of nonlinear summation in OLM cells.

      (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.

      The same response to point 1) above applies to OLM cells. As with neurogliaform cells, mean OLM baseline asynchronous (separate response) amplitudes were <0.5 mV, making it very difficult to record an isolated NMDAR-mediated uEPSC. Having said that, NMDARs do contribute to EPSCs elicited by stimulation of multiple afferents (e.g. Booker et al, 2021). We do not claim that dendritic calcium transients cannot be elicited following activation of NMDARs in OLM cells. We simply reported that the evoked uEPSPs, designed to approximate individual synaptic signals, were sub-threshold for detectable dendritic calcium signals under conditions that were suprathreshold in neurogliaform cells. The statement has been amended to specify that there were no detectable signals under our recording conditions. There is no evidence presented in the manuscript to suggest that OLM NMDARs are calcium impermeable and indeed no such claim was made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There is a large variability in the observed dendritic nonlinearity, in NDNF IN dendrites e.g. the uEPSP amplitude nonlinearity measure varies from as low as 10-20% to over 200%. As only single dendrites were recorded from each IN, it is unclear if this variability is among the cells or between individual dendrites. While the authors analyzed some potential factors, such as distance along the dendrites, branch order, or response magnitude (amplitude and integral), they did not find any substantial correlation. It remains open if different dendrites of NDNF INs, located in the str. moleculare vs. those in or projecting towards str. radiatum, have divergent properties. Similarly, for SOM INs an important question is if axon-carrying dendrites show distinct properties.

      In this context, it would be interesting to see not only values for the mean nonlinearity but also the maximal nonlinearity and its distribution.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in Fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify the analysis method. We did not address specifically whether dendrites projecting in different directions behaved differently. This is an interesting question beyond the scope of this study. Nor did we compare axon-carrying OLM dendrites to other dendrites.

      Figures:

      Figure 1: The gray line in plots g and h is not explained. While it looks like an identity line, the legend in plot i ("asynchronous") interferes.

      In plots g and h the gray line is the line of identity. In plot i it is an estimate of the linear summation. In plot i it is not the line of identity as it does not start at the origin with a slope of 1. The figure legend has been amended to clarify.

      In the same panels (Figure 1g,h, and subsequent figures) consider changing the title from "soma (voltage)" to uEPSP.

      The titles have been amended.

      In panel Figure 1i note the missing "(" in the title.

      Title amended.

      In panel Figure 1h: Shouldn't the X-axis label and legend text read "Arithmetic sum of (EPSP) integrals" instead of "Integral of arithmetic sum").

      The wording more accurately reflects the analytical operations. The asynchronous (separate) responses were summed arithmetically first, and then the integral was taken of each cumulative sum. We have therefore left the axis title and legend unchanged.

      Figure 2a,c: Could you please describe how the scaling was performed for the two axes?

      Method section amended.

      In the same panels (Figure 2a,c, and subsequent figures), the legend seems to be misleading: the plot is NS Amplitude/Integral vs Arithmetic sum, and the black line is the identity line (or scaled interpolation of the arithmetic sum, which is essentially the same).

      The scaled arithmetic sums (uEPSP amplitude, integral) represent linear summation and so overlap with the line of identity. The interpolation estimate of the asynchronous (separate) calcium transient response does not overlap with the line of identity as this estimate does not start at the origin with a slope of 1. The legends throughout the manuscript have been amended to clarify this.

      Figure 2b,d,f (and subsequent figures) slope plots: Please indicate that this is the average amplitude supralinearity for the individual recorded dendrites. Note here that the Results text mentions only the average amplitude supralinearity, but not the slop plots, paired mean difference, or Gardner-Altman estimation, illustrated in the figures.

      Nonlinearity as defined in the manuscript is a cumulative measurement. The final value per dendritic segment is therefore the sum of nonlinearities at 1 to 12 near-synchronous uncaging locations. The data for the individual dendritic segments are shown in the slopegraphs as in fig 2b, with their distribution visible. The averages referred to in the results correspond to the paired mean difference plots, which are the group summaries. The method section has been amended to clarify.

      Fig 2e: The legend (both text and figure, also in the following figures) is confusing, as the gray line and diamonds are defined as separate 12(?) responses, but it seems to represent a linear interpolation of the scaled arithmetic sums (ultimately nothing else but an identity line).

      The grey line shows the linear interpolation output between the calcium transient measurements at 1 uncaging location and at 12 uncaging locations. The 12th uncaging location is indicated in the key as “separate 12”. The linear interpolation in these plots does represent linear summation but is not the line of identity as it does not begin at the origin and does not have a slope of 1.

      Reviewer #2 (Recommendations for the authors):

      This study is well-developed and technically executed. I only have minor comments for the authors:

      (1) To target NDNF+ neurons, the authors use the NDNF-Cre mouse line and a Cre-dependent AAV using the mDLX promotor. Why the mDLX promotor? Would it have been sufficient to use any Cre-dependent fluorophore?

      Pilot experiments revealed leaky expression when a virus driving flexed ChR2 under a non-specific promoter (EF1a) was injected in the neocortex of Ndnf-Cre mice (Author response image 1). In our hands, and in line with Dimidschtein et al (2016),  the use of the mDLX enhancer reduced off-target expression.

      Author response image 1.

      A. AAV2/5-EF1a-DIO-hChR2(H134R)-mCherry injected into superficial neocortex of Ndnf-Cre mice led to expression in a few pyramidal neurons in addition to layer 1 neurogliaform cells. B. Patch-clamp recording from a non-labelled pyramidal cell showed that an optogenetically evoked glutamatergic current remained after blockade of GABAA and GABAB receptors, further confirming limited specificity of expression of ChR2. (Data from M Muller, M Mercier and V Magloire, Kullmann lab.)

      (2) The distance of the uncaring sites from the soma plays a key role. The authors should indicate the mean distance of the cluster and its variance.

      Uncaging distance from soma is indicated for both NGF and OLM interneurons in the supplementary figures S2 and S3 respectively.

      (3) Martina et al., in Science 2000, showed high levels of Na+ channels in the dendrites of OLM cells and hinted that spikes could occur in them. The authors should discuss this possible discrepancy.

      Discussion amended.

      (4) Looking at Figure 1d, the EPSPs look exceptionally long-lasting, longer than those observed by stimulating axonal inputs. Could this indicate spill-over excitation? If so, how could this affect the outcome of this study?

      The asynchronous (separate responses) decay to baseline within 100 ms, similar to the neurogliaform EPSPs evoked by electrical stimulation of axons in the SLM in Mercier et al. 2022. We observed clear plateau potentials in a minority of cells (e.g. Fig. S1b). Such plateau potentials can be generated by dendritic calcium channels and we do not consider that glutamate spillover needs to be invoked to account for them.

      (5) In the legend of Figure 2: "n=11 dendrites in 11 cells from 9 animals". Why do the authors only study 11 dendrites from 11 cells? Isn't it possible to repeatedly stimulate clusters of synaptic inputs onto the same cells? In principle, could one test many dendrites of the same cell at different distances from the soma? It is also remarkable that there were very few cells per animal.

      The goal always was to record from as many dendrites as possible from the same cells whilst maintaining high standards of cell health. When cell health indicators such as blebbing, input resistance change or resting voltage change were detected, no further dendritic location could be tested with reasonable confidence. In a given 400 um slice there would be relatively few healthy candidate cells at a suitable depth to attempt to patch-clamp.

    1. Author response:

      We would like to extend our sincere thanks to you and reviewers at eLife for their thoughtful handling of our manuscript and their valuable feedback, which will greatly improve our study.

      We are committed to performing the additional experiments as recommended by the reviewers. However, we would like to clarify our study's focus. 

      The novelty of our study lies in the highlights of our manuscript:

      • The formation of HIV-induced CPSF6 puncta is critical for restoring HIV-1 nuclear reverse transcription (RT).

      • CPSF6 protein lacking the FG peptide cannot bind to the viral core, thereby failing to form HIVinduced CPSF6 puncta.

      • The FG peptide, rather than low-complexity regions (LCRs) or the mixed charge domains (MCDs) of the CPSF6 protein, drives the formation of HIV-induced CPSF6 puncta.

      • HIV-induced CPSF6 puncta form individually and later fuse with nuclear speckles (NS) via the intrinsically disordered region (IDR) of SRRM2.

      By focusing on these processes, we believe our study provides a critical perspective on the molecular interactions that mediate the formation of HIV-induced CPSF6 puncta and broadens the understanding of how HIV manipulates host nuclear architecture.

      Public Reviews: 

      Reviewer #1 (Public review): 

      In recent years, our understanding of the nuclear steps of the HIV-1 life cycle has made significant advances. It has emerged that HIV-1 completes reverse transcription in the nucleus and that the host factor CPSF6 forms condensates around the viral capsid. The precise function of these CPSF6 condensates is under investigation, but it is clear that the HIV-1 capsid protein is required for their formation. This study by Tomasini et al. investigates the genesis of the CPSF6 condensates induced by HIV-1 capsid, what other co-factors may be required, and their relationship with nuclear speckels (NS). The authors show that disruption of the condensates by the drug PF74, added post-nuclear entry, blocks HIV-1 infection, which supports their functional role. They generated CPSF6 KO THP-1 cell lines, in which they expressed exogenous CPSF6 constructs to map by microscopy and pull down assays of the regions critical for the formation of condensates. This approach revealed that the LCR region of CPSF6 is required for capsid binding but not for condensates whereas the FG region is essential for both. Using SON and SRRM2 as markers of NS, the authors show that CPSF6 condensates precede their merging with NS but that depletion of SRRM2, or SRRM2 lacking the IDR domain, delays the genesis of condensates, which are also smaller. 

      The study is interesting and well conducted and defines some characteristics of the CPSF6-HIV-1 condensates. Their results on the NS are valuable. The data presented are convincing. 

      I have two main concerns. Firstly, the functional outcome of the various protein mutants and KOs is not evaluated. Although Figure 1 shows that disruption of the CPSF6 puncta by PF74 impairs HIV-1 infection, it is not clear if HIV-1 infection is at all affected by expression of the mutant CPSF6 forms (and SRRM2 mutants) or KO/KD of the various host factors. The cell lines are available, so it should be possible to measure HIV-1 infection and reverse transcription. Secondly, the authors have not assessed if the effects observed on the NS impact HIV-1 gene expression, which would be interesting to know given that NS are sites of highly active gene transcription. With the reagents at hand, it should be possible to investigate this too. 

      We thank the reviewer for her/his valuable feedback on our manuscript. We are pleased to see her/his appreciation of our results, and we will do our utmost to address the highlighted points to further improve our work.

      Reviewer #2 (Public review): 

      Summary: 

      HIV-1 infection induces CPSF6 aggregates in the nucleus that contain the viral protein CA. The study of the functions and composition of these nuclear aggregates have raised considerable interest in the field, and they have emerged as sites in which reverse transcription is completed and in the proximity of which viral DNA becomes integrated. In this work, the authors have mutated several regions of the CPSF6 protein to identify the domains important for nuclear aggregation, in addition to the alreadyknown FG region; they have characterized the kinetics of fusion between CPSF6 aggregates and SC35 nuclear speckles and have determined the role of two nuclear speckle components in this process (SRRM2, SUN2). 

      Strengths: 

      The work examines systematically the domains of CPSF6 of importance for nuclear aggregate formation in an elegant manner in which these mutants complement an otherwise CPSF6-KO cell line. In addition, this work evidences a novel role for the protein SRRM2 in HIV-induced aggregate formation, overall advancing our comprehension of the components required for their formation and regulation. 

      Weaknesses: 

      Some of the results presented in this manuscript, in particular the kinetics of fusion between CPSF6aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983). 

      The observations of the different effects of CPSF6 mutants, as well as SRRM2/SUN2 silencing experiments are not complemented by infection data which would have linked morphological changes in nuclear aggregates to function during viral infection. More importantly, these functional data could have helped stratify otherwise similar morphological appearances in CPSF6 aggregates. 

      Overall, the results could be presented in a more concise and ordered manner to help focus the attention of the reader on the most important issues. Most of the figures extend to 3-4 different pages and some information could be clearly either aggregated or moved to supplementary data. 

      First, we thank the reviewer for her/his appreciation of our study and to give to us the opportunity to better explain our results and to improve our manuscript. We appreciate the reviewer’s positive feedback on our study, and we will do our best to address her/his concerns. In the meantime, we would like to clarify the focus of our study. Our research does not aim to demonstrate an association between CPSF6 condensates (we use the term "condensates" rather than "aggregates," as aggregates are generally non-dynamic (Alberti & Hyman, 2021; Banani et al., 2017), and our work specifically examines the dynamic behavior of CPSF6 during infection, as shown in Scoca et al., JMCB 2022) and SC35 nuclear speckles. This association has already been established in previous studies, as noted in the manuscript.

      About the point highlighted by the reviewer: "Kinetics of fusion between CPSF6-aggregates and SC35 speckles have been published before (PMID: 32665593; 32997983)."

      Our study differs from prior work PMID 32665593 because we utilize a full-length HIV genome and we did not follow the integrase (IN) fluorescence in trans and its association with CPSF6 but we specifically assess if CPSF6 clusters form in the nucleus independently of NS factors and next to fuse with them. In the current study we evaluated the dynamics of formation of CPSF6/NS puncta, which it has not been explored before. Given this focus, we believe that our work offers a novel perspective on the molecular interactions that facilitate HIV / CPSF6-NS fusion.

      For better clarity, we would like to specify that our study focuses on the role of SON, a scaffold factor of nuclear speckles, rather than SUN2 (SUN domain-containing protein 2), which is a component of the LINC (Linker of Nucleoskeleton and Cytoskeleton) complex.

      As suggested by the reviewer, we will keep key information in the main figure and move additional details to the supplementary material.

      Reviewer #3 (Public review): 

      In this study, the authors investigate the requirements for the formation of CPSF6 puncta induced by HIV-1 under a high multiplicity of infection conditions. Not surprisingly, they observe that mutation of the Phe-Gly (FG) repeat responsible for CPSF6 binding to the incoming HIV-1 capsid abrogates CPSF6 punctum formation. Perhaps more interestingly, they show that the removal of other domains of CPSF6, including the mixed-charge domain (MCD), does not affect the formation of HIV-1-induced CPSF6 puncta. The authors also present data suggesting that CPSF6 puncta form individual before fusing with nuclear speckles (NSs) and that the fusion of CPSF6 puncta to NSs requires the intrinsically disordered region (IDR) of the NS component SRRM2. While the study presents some interesting findings, there are some technical issues that need to be addressed and the amount of new information is somewhat limited. Also, the authors' finding that deletion of the CPSF6 MCD does not affect the formation of HIV-1-induced CPSF6 puncta contradicts recent findings of Jang et al. (doi.org/10.1093/nar/gkae769). 

      We thank the reviewer for her/his thoughtful feedback and the opportunity to elaborate on why our findings provide a distinct perspective compared to those of Jang et al. (doi.org/10.1093/nar/gkae769), while aligning with the results of Rohlfes et al. (doi.org/10.1101/2024.06.20.599834).

      One potential reason for the differences between our findings and those of Jang et al. could be the choice of experimental systems. Jang et al. conducted their study in HEK293T cells with CPSF6 knockouts, as described in Sowd et al., 2016 (doi.org/10.1073/pnas.1524213113). In contrast, our work focused on macrophage-like THP-1 cells, which share closer characteristics with HIV-1’s natural target cells. 

      Our approach utilized a complete CPSF6 knockout in THP-1 cells, enabling us to reintroduce untagged versions of CPSF6, such as wild-type and deletion mutants, to avoid potential artifacts from tagging. Jang et al. employed HA-tagged CPSF6 constructs, which may lead to subtle differences in experimental outcomes due to the presence of the tag.

      Finally, our investigation into the IDR of SRRM2 relied on CRISPR-PAINT to generate targeted deletions directly in the endogenous gene (Lester et al., 2021, DOI: 10.1016/j.neuron.2021.03.026). This approach provided a native context for studying SRRM2’s role.

      We will incorporate these clarifications into the discussion section of the revised manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Weaknesses:

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study.

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNA-seq analyses in whole fat body tissues expressing MPC. These methods revealed hyperactivation of mTORC1 and Myc signaling in fat body cells expressing MPC in Drosophila, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses inappropriate reductions in cell size and activates signaling pathways to promote cell growth. The best characterized “sizer” pathway for mammalian cells is the CycD/CDK4 complex which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTOR pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those over-expressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc are epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. TAG abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses:

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention.

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We will carefully revise the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes.

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation.

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes?

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impacts nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author response image 1A). However, the reduction in nuclear size became significant only after 36 hours of MPC expression (Author response image 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author response image 1C and D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells. A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>²</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R<sup>²</sup> value.

      This image highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those over-expressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis?

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass.

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are.

      We appreciate this valuable suggestion. In the revised manuscript, we will quantify trehalose abundance in circulation and within fat bodies. As described in the Methods section, following the approach outlined in Ugrankar-Banerjee et al., 2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. We will apply this methodology to include the trehalose measurements as part of our updated analysis.

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8).

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we will quantify the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR and validate the corresponding data in Supplementary Figure 8f through western blot analysis.

      Additionally, we will assess the efficiency of pcb, pdha, dlat, pepck2, and Got2 manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      This is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning (i.e., no method is at an F1-Score of 1).

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:

      *  comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      *  comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment, presentation of the results, and reproducibility.

      We address your concerns and misunderstandings below.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small nuclei, much smaller than the pretraining datasets of CellPose and StarDist. I assume that this is one of the main reasons why these well-established methods don't work for this dataset.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Limiting method comparison to only this dataset may create a misleading impression that CellSeg3D is superior for all kinds of 3D nucleus segmentation tasks, whereas this might only hold for small nuclei.

      The GT dataset we labeled has nuclei that are normal brain-cell sized. Moreover in Figure 2 we show very different samples with both dense and noisy (c-FOS) labeling.

      We also clearly do not claim this is superior for all tasks, from our text: “First, we benchmark our methods against Cellpose and StarDist, two leading supervised cell segmentation packages with user-friendly workflows, and show our methods match or outperform them in 3D instance segmentation on mesoSPIM-acquired volumes" – we explicitly do NOT claim beyond the scope of the benchmark. Moreover we state: "We found that WNet3D could be as good or better than the fully supervised models, especially in the low data regime, on this dataset at semantic and instance segmentation" – again noting on this dataset. Again, we only claimed we can be as good as these methods with an unsupervised approach, and in the low-GT data regime we can excel.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I doubt that the claims hold for larger and or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be much stronger if a **fair** comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a) this is not a valid experimental setup and amounts to training on your test set. If b) this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-ext ra.html#threshold-predictions.

      For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot reproduce any of the results using the plugin. I tried to reproduce some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite close (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix:https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to reproduce the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics.

      We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth. Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) One of the listed contributions is "adding the SoftNCuts loss". This is not true, reference 10 already introduced that loss.

      “Our changes include a conversion to a fully 3D architecture and adding the SoftNCuts loss” - we dropped the common and added the word “AND” to note that we added the 3D version of the SoftNCuts loss TO the 3D architecture, which 10 did not do.

      (2) "Typically, these methods use a multi-step approach" to segment 3D from 2D: this is only true for CellPose, StarDist does real 3D.

      That is why we preface with “typically” which implies not always.

      (3) "see Methods, Figure 1c, c)" is missing an opening in parentheses.

      (4) K is not introduced in equation (1) (presumably the number of classes, which seems to be 2 for all experiments considered).

      k actually was introduced just below equation 1 as the number of classes. We added the note that k was set to 2.

      (5) X is not introduced in equation (2) (presumably the spatial position of a voxel).

      Sorry for this oversight. We add that $X$ is the spatial position of the voxel.

      Reviewer #2 (Recommendations For The Authors):

      To improve the paper the weaknesses mentioned above should be addressed:

      (1) Compare to StarDist and/or CellPose on further datasets, esp. using pre-trained CellPose, to see if the claims of competitive performance with state-of-the-art approaches hold up for the case of different nucleus morphologies. The EmbedSeg datasets from Figure 2 c are well suited for this. In the current form, the claims are too broad and not supported if thorough experiments are performed on a single dataset with a very specific morphology. Note: even if the method is not fully competitive with CellPose / StarDist on these Datasets it holds merit since a segmentation method that works for small nuclei as in the mesoSPIM dataset and works self-supervised is very valuable.

      (2) Clarify how the best instance segmentation hyperparameters are found. If you indeed optimize these on the same part of the dataset used for evaluating metrics then the current experimental set-up is invalid. If this is not the case I would still rethink if this is a good way to report the results since it does not seem to reflect user experience. I found it not possible to find good hyperparameters for either of the two segmentation approaches I tried (see also next point) so I think these numbers are too optimistic.

      (3) Improve the instance segmentation part of the plugin: either provide troubleshooting for how to install pyClesperanto correctly to use the voronoi-based instance segmentation or implement it based on more standard functionality like skimage / scipy. Provide more guidance for finding good hyperparameters for the segmentation task.

      (4) Make sure image resizing is done correctly when using pre-trained CellPose models and report on this.

      (5) Report F1 Scores only (unless there is a compelling reason to also report Dice).

      (6) Address the limitations of the method in more detail.

      On a positive note: all data and code are available and easy to download/install. A minor comment: it would be very helpful to have line numbers for reviewing a revised version.

      All comments are also addressed in the public reviews.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides in vivo evidence for the synchronization of projection neurons in the olfactory bulb at gamma frequency in an activity-dependent manner. This study uses optogenetics in combination with single-cell recordings to selectively activate sensory input channels within the olfactory bulb. The data are thoughtfully analyzed and presented; the evidence is solid, although some of the conclusions are only partially supported.

      We deeply thank all the reviewers for their time, effort, and insightful comments. Their revision led to a significant improvement of the paper.

      The reviewers suggested toning down our claim that we found a mechanism that synchronizes all odor-evoked MTC activities, as we do not directly show that. We concur and address this in our revised version to ensure a precise interpretation of our findings. In short, we state that we revealed a synchronization mechanism between two groups of active mitral and tufted cells (MTCs) and show that this synchronization is activity-dependent and distance-independent. This mechanism can enable the synchronization of all odor-activated MTCs.

      Another issue raised is the interpretation of the results obtained under Ketamine anesthesia. Ketamine is an NMDA receptor antagonist that plays a crucial role in the  MTC-GC reciprocal synapse. To address this, we include new analyses demonstrating that optogenetic activation of granule cells (GCs) can inhibit the recorded MTCs during baseline activity but does not substantially affect odor-evoked MTC firing rates. We show that this is correct in both Ketamine-induced anesthesia and awake mice (Dalal & Haddad, 2022). This indicates that GC-MTC connections are functional even under Ketamine anesthesia, however, they do not exert substantial suppression on odor-evoked MTC responses. We added a paragraph to the discussion section on the potential influence of Ketamine anesthesia on GC-MTC synapses and its implications on our findings.

      Finally, we discuss several recent studies that are particularly relevant to our research and expand the discussion on our hypothesis that parvalbumin-positive cells in the olfactory bulb may serve as key mediators of the activity- and distance-dependent lateral inhibition observed in our findings.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dalal and Haddad investigated how neurons in the olfactory bulb are synchronized in oscillatory rhythms at gamma frequency. Temporal coordination of action potentials fired by projection neurons can facilitate information transmission to downstream areas. In a previous paper (Dalal and Haddad 2022, https://doi.org/10.1016/j.celrep.2022.110693), the authors showed that gamma frequency synchronization of mitral/tufted cells (MTCs) in the olfactory bulb enhances the response in the piriform cortex. The present study builds on these findings and takes a closer look at how gamma synchronization is restricted to a specific subset of MTCs in the olfactory bulb. They combined odor and optogenetic stimulations in anesthetized mice with extracellular recordings.<br /> The main findings are that lateral synchronization of MTCs at gamma frequency is mediated by granule cells (GCs), independent of the spatial distance, and strongest for MTCs with firing rates close to 40 Hz. The authors conclude that this reveals a simple mechanism by which spatially distributed neurons can form a synchronized ensemble. In contrast to lateral synchronization, they found no evidence for the involvement of GCs in lateral inhibition of nearby MTCs.

      Strengths:

      Investigating the mechanisms of rhythmic synchronization in vivo is difficult because of experimental limitations for the readout and manipulation of neuronal populations at fast timescales. Using spatially patterned light stimulation of opsin-expressing neurons in combination with extracellular recordings is a nice approach. The paper provides evidence for an activity-dependent synchronization of MTCs in gamma frequency that is mediated by GCs.

      Weaknesses:

      An important weakness of the study is the lack of direct evidence for the main conclusion - the synchronization of MTCs in gamma frequency. The data shows that paired optogenetic stimulation of MTCs in different parts of the olfactory bulb increases the rhythmicity of individual MTCs (Figure 1) and that combined odor stimulation and GC stimulation increases rhythmicity and gamma phase locking of individual MTCs (Figure 4). However, a direct comparison of the firing of different MTCs is missing. This could be addressed with extracellular recordings at two different locations in the olfactory bulb. The minimum requirement to support this conclusion would be to show that the MTCs lock to the same phase of the gamma cycle. Also, showing the evoked gamma oscillations would help to interpret the data.

      We agree with the reviewer that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We based this on previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., 2011 showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Another weakness is that all experiments are performed under anesthesia with ketamine/medetomidine. Ketamine is an antagonist of NMDA receptors and NMDA receptors are critically involved in the interactions of MTCs and GCs at the reciprocal synapses (see for example Lage-Rupprecht et al. 2020, https://doi.org/10.7554/eLife.63737; Egger and Kuner 2021, https://doi.org/10.1007/s00441-020-03402-7). This should be considered for the interpretation of the presented data.

      This issue has been raised by reviewers #1 and #2. We think, as also reviewer #2 acknowledged, that this issue does not compromise our results. However, to address this important point we added the below section to the Discussion:

      “Our experiments were performed under Ketamine anesthesia, an NMDA receptor antagonist that affects the reciprocal dendro-dendritic synapses between MTCs and GCs (Egger and Kuner, 2021; Lage-Rupprecht et al., 2020). Consistent with that, recent studies reported lower excitability of GC activity under anesthesia (Cazakoff et al., 2014; Kato et al., 2012).  This raises the concern that our result might not be valid in the awake state. We argue that this is unlikely. First, (Fukunaga et al., 2014) reported that GCs baseline activity in anesthetized and awake mice is similar, suggesting that MTC-GC synapses are functioning. Second, we show that light activation of GCL neurons strongly inhibits the MTC baseline activity (Figure 5) and increases MTC odor-evoked spike-LFP coupling in the gamma range (Figure 4). These experiments validate that GCL neurons can exert inhibition over MTCs in our experimental setup. Third, we have shown that light-activating all accessible GCL neurons has a minor effect on the MTC odor-evoked firing rates in an awake state (Dalal and Haddad, 2022), corroborating the finding that GCL neurons are unlikely to provide strong suppression to MTCs. Fourth, and most importantly, we showed that optogenetic stimulation of MTCs entrains other MTC spike times, which is achieved via the GCL neurons. This suggests that the lack of lateral suppression following MTC or GCL neuron opto-activation is not due to MTC-GC synapse blockage. That said, we cannot exclude the unlikely possibility that NMDA receptor blockage under anesthesia impairs MTC-to-MTC suppressive interactions but not the MTC-to-MTC mediated spike entrainment.”

      Figure 1A and D from Dalal & Haddad 2022 show the effect of GCL neurons opto-activation during odor stimulation on MTC firing rates in awake and anesthetized mice.

      Furthermore, the direct effect of optogenetic stimulation on GCs activity is not shown. This is particularly important because they use Gad2-cre mice with virus injection in the olfactory bulb and expression might not be restricted to granule cells and might not target all subtypes of granule cells (Wachowiak et al., 2013, https://doi.org/10.1523/JNEUROSCI.4824-12.2013). This should be considered for the interpretation of the data, particularly for the absence of an effect of GC stimulation on lateral inhibition.

      In this study we used Gad2-cre mice, and the protocol for viral transfection of GCL neurons reported in Fukunaga et al., 2014. They reported that: ‘more than 90% of Cre-expressing neurons in the GCL also expressed fluorescently tagged ArchT’. Consistently, when Fukunaga et al. expressed ChR2 in the GCL using the same viral infection as we used, they reported that: ”Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in

      turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)”. This study shows clearly that this infection protocol is robust. Moreover, in new panels we added to the manuscript (Figure 5a-b), we show that optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions but not odor-evoked responses MTCs. This is consistent with the reports by Fukunaga et al, and indicates that GCL neurons are functional as they can suppress MTC baseline activity.

      Finally, since virus injection to the granule cell layer can target other GCL neuron types, we changed the reference in the text to GCL neurons (as was done in Gschwend et al., 2015) instead of ‘GCs’ when referring to GC. We replaced the image in Figure 4A, to show the expression of ChR2 is restricted to GCL neurons. That said, it is still possible that our protocol did not infect all GC subtypes. To address this, we added this line to the Discussion: “We also note that our viral transfection protocol in Gad2-Cre mice might not transfect all subtypes of GCs”

      Several conclusions are only supported by data from example neurons. The paper would benefit from a more detailed description of the analysis and the display of some additional analysis at the population level:

      - What were the criteria based on which the spots for light-activation were chosen from the receptive field map?

      In order to make this point clearer, we extended the explanation in the Methods on the selection criteria: “Spots were selected either randomly or manually. In the manual selection case, we selected spots that caused either significant or mild but insignificant inhibitory effect on the recorded MTC (e.g., local cold spots in the receptive-field map; see example in Figure 2a of example spots that were selected manually)”. We also add a reference in the text to the Methods: “see Methods for spots selection criteria”.

      - The absence of an effect on firing rate for paired stimulations is only shown for one example (Figure 1c). A quantification of the population level would be interesting.

      - Only one example neuron is shown to support the conclusion that "two different neural circuits mediate suppression and entrainment" in Figure 3. A population analysis would provide more evidence.

      Thank you very much for these comments. We added a population analysis in Figure 3. This analysis shows a dissociation between firing rate suppression and the entrainment groups (Figure 3c-d). This suggests that two different circuits mediate suppression and entrainment.

      - Only one example neuron is shown to illustrate the effect of GC stimulation on gamma rhythmicity of MTCs in Figures 4 f,g.

      In this figure, we show that the activation of subsets of GCL neurons elevated odor-evoked spike synchronization to the gamma rhythm. We thought it would be beneficial to demonstrate the change in spike entrainment following GCL neurons optogenetic activation regardless of the ongoing OB gamma oscillations, using the method presented by Fukunaga et al., 2014. However, this analysis requires that the neuron has a relatively high firing rate. As we describe in the figure legend of this panel, this neuron is probably a tufted cell based on the findings shown in Fukunaga et al., 2014 and Burton & Urban, 2021. Most of our recorded cells had a lower firing rate, which coincides with our typical recording depth, targeting mitral cells rather than tufted cells (~400µm deep). Since this analysis is shown only over a single neuron, we moved it to Supplementary Figure 4.

      - In Figure 5 and the corresponding text, "proximal" and "distal" GC activation are not clearly defined.

      We agree. Initially, we used these terms to refer to GC columns that include the recorded MTC (proximal) and columns that are away from it (distal). We decided that instead of using a coarse division, we would show the whole range of distances. We updated the analysis in Figure 5d to show the effect of GC optogenetic activation on MTC odor-evoked responses as a function of the distance from the recorded MTC.

      Reviewer #2 (Public Review):

      Summary

      This study provides a detailed analysis and dissociation between two effects of activation of lateral inhibitory circuits in the olfactory bulb on ongoing single mitral/tufted cell (MTC) spiking activity, namely enhanced synchronization in the gamma frequency range or lateral inhibition of firing rate.

      The authors use a clever combination of single-cell recordings, optogenetics with variable spatial stimulation of MTCs and sensory stimulation in vivo, and established mathematical methods to describe changes in autocorrelation/synchronization of a single MTC's spiking activity upon activation of lateral glomerular MTC ensembles. This assay is rounded off by a gain-of-function experiment in which the authors enhance granule cell (GC) excitation to establish a causal relation between GC activation and enhanced synchronization to gamma (they had used this manipulation in their previous paper Dalal & Haddad 2022, but use a smaller illumination spot here for spatially restricted activation).

      Strengths

      This study is of high interest for olfactory processing - since it shows directly that interactions between only two selected active receptor channels are sufficient to enhance the synchronization of single neurons to gamma in one channel (and thus by inference most likely in both). These interactions are distance-independent over many 100s of µms and thus can allow for non-topographical inhibitory action across the bulb, in contrast to the center-surround lateral inhibition known from other sensory modalities.

      In my view, parallels between vision and olfaction might have been overemphasized so far, since the combinatorial encoding of olfactory stimuli across the glomerular map might require different mechanisms of lateral interaction versus vision. This result is indicative of such a major difference.

      Such enhanced local synchronization was observed in a subset of activated channel pairs; in addition, the authors report another type of lateral interaction that does involve the reduction of firing rates, drops off with distance and most likely is caused by a different circuit-mediated by PV+ neurons (PVN; the evidence for which is circumstantial).

      Weaknesses/Room for improvement

      Thus this study is an impressive proof of concept that however does not yet allow for broad generalization. Therefore the framing of results should be slightly more careful in my opinion.

      We agree with the reviewer. We copy here our response to reviewer #1, who raised the same issue.

      We agree that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We relayed previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Along this line, the conclusions regarding two different circuits underlying lateral inhibition vs enhanced synchronization are not quite justified by the data, e.g.

      (1) The authors mention that their granule cell stimulation results in a local cold spot (l. 527 ff) - how can they then said to be not involved in the inhibition of firing rate (bullet point in Highlights)? Please elaborate further. In l.406 they also state that GCs can inhibit MTCs under certain conditions. The argument, that this stimulation is not physiological, makes sense, but still does not rule out anything. You might want to cite Aghvami et al 2022 on the very small amplitude of GC-mediated IPSPs, also McIntyre and Cleland 2015.

      We apologize for the lack of clarity. We reported that we found a local cold spot in the context of an additional experiment not presented in the manuscript and only described in the Methods section. Following the revision, we decided to add the analysis of this experiment to Figure 5. This experiment validated that optogenetic activation of GCs is potent and can affect the recorded MTC firing rates. This is particularly important as we performed all experiments under Ketamine anesthesia, which is a NMDA receptor antagonist. In this experiment, we recorded the activity of MTCs at baseline conditions (without odor presentation) under optogenetic activation of GCs. We divided the OB surface into a grid and optogenetically activated GC columns at a random order, one light spot in each trial, using light patches of size of size 330um2. We used the same light intensity as in the optogenetic GC activation during odor stimulation (reported in Figures 4-5). We show that the recorded MTC was strongly inhibited by GC light activation, mostly when activating GCs in its vicinity (within its column, i.e., local cold spot). This experiment validates that in our experimental setup, GCs can exert inhibition over MTCs at baseline conditions.

      (2) Even from the shown data, it appears that laterally increased synchronization might co-occur with lateral suppression (See also comment on Figures 1d,e and Figure S1c)

      We kindly note that the panels you referred to do not quantify the firing rate but the rhythmicity of MTC light-evoked responses. We should have explained these graphs better in the main text and not only in the Methods section. We added a panel to Supplementary Figure 1, which describes our analysis: In each of these examples, we performed a time-frequency Wavelet analysis over the average response of the neurons across trials (computed using a sliding Gaussian with a std of 2ms). The results of the Wavelet analysis allowed us to visually capture the enhanced spike alignment across trials under paired activation as a function of the stimulus duration (as, for example, in Figure 1c, middle panel). The response amplitude to light stimulation did not change in this example (shown in Figure 1c lower panel), and the spikes entrainment increased following paired activation of MTCs.

      To address the relations between lateral suppression and synchronization at the population level, we added additional analyses to Figure 3c-d.

      (3) There are no manipulations of PVN activity in this study, thus there is no direct evidence for the substrate of the second circuit.

      We completely agree with the reviewer. Using the current data, we can only claim that optogenetic activation of GCL neurons did not affect the MTC odor-evoked response. This finding is consistent with the loss-of-function experiment reported by Fukunaga et al., 2014, where GC suppression did not change odor-evoke responses in both anesthetized and awake mice. Therefore, we speculated that PVN might be a candidate OB interneuron to mediate lateral inhibition between MTCs. This hypothesis is based on their higher likelihood of interconnecting two MTCs compared with GCs (Burton, 2017). We elaborated on this in the discussion and made sure it is clearly stated as a hypothesis.

      (4) The manipulation of GC activity was performed in a transgenic line with viral transfection, which might result in a lower permeation of the population compared to the line used for optogenetic stimulation of MTCs.

      We used a previously validated protocol for optogenetic manipulation of GCs from Fukunaga et al., 2014 in order to minimize this caveat. As we cited previously from their paper, following the expression of ChR2 in the GCL, ‘Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)’. These results are consistent with the additional experiment we added to the manuscript, where optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions (without odor presentation). The high similarity between these two reports, in which, in the case of Fukunaga et al., GC activation was directly measured, suggests that lack of opsin expression or insufficient light intensity is unlikely to explain the lack of GCL neuron activation effect on lateral inhibition. Moreover, GCL neurons' optogenetic activation during odor stimulation increased MTC spike-LFP coupling in the gamma range. Therefore, the dissociation between the effects of GCL neurons on spike entrainment and lateral inhibition suggests that the lack of lateral inhibition following GC activation is unlikely due to low expression rates.

      In some instances, the authors tend to cite older literature - which was not yet aware of the prominent contribution of EPL neurons including PVN to recurrent and lateral inhibition of MT cells - as if roles that then were ascribed to granule cells for lack of better knowledge can still be unequivocally linked to granule cells now. For example, they should discuss Arevian et al (2006), Galan et al 2006, Giridhar et al., Yokoi et al. 1995, etc in the light of PVN action.

      Therefore it is also not quite justified to state that their result regarding the role of GCs specifically for synchronization, not suppression, is "in contrast to the field" (e.g. l.70 f.,, l.365, l. 400 ff).

      We changed several sentences in the discussion and introduction to explain that previous studies attributed lateral suppression to GC because they were not aware of the prominent contribution of EPL neurons as has been demonstrated by more recent studies (Burton 2024, Huang et al., 2016,  Kato et al., 2013, and more).

      We also toned down the statement that these findings are in contrast to the field. Instead, we state that our findings support the claim that GCs are not involved in affecting MTC odor-evoked firing rate.

      Why did the authors choose to use the term "lateral suppression", often interchangeably with lateral inhibition? If this term is intended to specifically reflect reductions of firing rates, it might be useful to clearly define it at first use (and cite earlier literature on it) and then use it consistently throughout.

      We agree and have changed the manuscript accordingly. We added the following in the introduction: “We use this phrase here to refer to a process that suppresses the firing rate of the post-synaptic neuron.”

      A discussion of anesthesia effects is missing - e.g. GC activity is known to be reportedly stronger in awake mice (Kato et al). This is not a contentious point at all since the authors themselves show that additional excitation of GCs enhances synchrony, but it should be mentioned.

      We completely agree and added a paragraph to the Discussion in this regard. Please see also the response to reviewer #1, who made a similar suggestion.

      Some citations should be added, in particular relevant recent preprints - e.g. Peace et al. BioRxiv 2024, Burton et al. BioRxiv 2024 and the direct evidence for a glutamate-dependent release of GABA from GCs (Lage-Rupprecht et al. 2020).

      We thank the reviewer for noting us these relevant recent manuscripts. We have now cited Peace et al., when discussing the spatial range of inhibition and gamma synchronization in the OB, Lage-Rupprecht et al in the context of the involvement of NMDA receptor in MTC-GC reciprocal synapse and Burton et al. when discussing PV neurons potential function.

      The introduction on the role of gamma oscillations in sensory systems (in particular vision) could be more elaborated.

      In our previous paper (Dalal & Haddad 2022) we had an elaborated introduction on the role of gamma oscillations in sensory processing, since we focused in this study in the effect of gamma synchronization on information transmission between brain regions. In the current study we looked at gamma rhythms as a mechanism that can facilitate ensemble synchronization.

      Reviewer #3 (Public Review):

      Summary:

      This study by Dalal and Haddad analyzes two facets of cooperative recruitment of M/TCs as discerned through direct, ChR2-mediated spot stimulations:

      (1) mutual inhibition and

      (2) entrainment of action potential timing within the gamma frequency range.

      This investigation is conducted by contrasting the evoked activity elicited by a "central" stimulus spot, which induces an excitatory response alone, with that elicited when paired with stimulations of surrounding areas. Additionally, the effect of Gad2-expressing granule cells is examined.

      Based on the observed distance dependence and the impact of GC stimulations, the authors infer that mutual inhibition and gamma entrainment are mediated by distinct mechanisms.

      Strengths:

      The results presented in this study offer a nice in vivo validation of the significant in vitro findings previously reported by Arevian, Kapoor, and Urban in 2008. Additionally, the distance-dependent analysis provides some mechanistic insights.

      We thank the reviewer for his comments. Indeed, the current study provides in-vivo replication of the results reported in Arevian et al., 2008 in-vitro, and adds further insights by showing that lateral inhibition is distant-dependent. However, this is not the main focus of the current study. Following the findings reported by Dalal & Haddad 2022, the motivation for this study was to test the mechanism that allows co-activated MTCs to entrain their spike timing. By light-activating pairs of MTCs at varying distances, we detected a subset of pairs in which paired light-activation evoked activity-dependent lateral inhibition, as was reported by Arevian et al., 2008. Moreover, we think it is highly important to know that a previous result in an in-vitro study is fully reproducible in-vivo.

      Weaknesses:

      The results largely reproduce previously reported findings, including those from the authors' own work, such as Dalal and Haddad (2022), where a key highlight was "Modulating GC activities dissociates MTCs odor-evoked gamma synchrony from firing rates." Some interpretations, particularly the claim regarding the distance independence of the entrainment effect, may be considered over-interpretations.

      We kindly disagree with the reviewer. We think the current study extends rather than reproduces the findings reported in Dalal & Haddad 2022. The 2022 study mainly focused on the effect of OB gamma synchronization on odor representation in the Piriform cortex. We bidirectionally modulated the level of MTC gamma synchronization and found that it had bidirectional effects on odor representation in one of their downstream targets, the anterior piriform cortex. The current study, however, focuses on the question of how spatially distributed odor-activated MTCs can synchronize their spiking activity. Our current main finding is that paired activation of MTCs can enhance the spikes entrainment of the recorded MTC in an activity-dependent and spatially independent manner. We suggest that this mechanism is mediated by GCL neurons.

      The reviewer did not explain why he\she thinks that the distance independence of the entrainment effects is an over-interpretation. However, to make our claim more precise we added the following sentence to the corresponding results section:” Furthermore, within the distance range that we were able to measure, the increased phase-locking did not significantly correlate with the distance from the MTC”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Line 17f: "This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, ..."

      This sentence implies a direct comparison of the simultaneously recorded firing of MTCs but I could not find evidence for this in this manuscript. I would suggest to change this.

      We thank the reviewer. The sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”.

      (2) Line 43f: A brief description of what glomeruli are could help to avoid confusion for readers less familiar with the OB. The phrasing of "activated glomeruli" and "each glomerulus innervates" are somewhat misleading given that they do not contain the cell bodies of the projection neurons.

      We edited this part of the introduction so it briefly describes what glomeruli are: ‘Olfactory processing starts with the activity of odorant-activated olfactory sensory neurons. The axons of these sensory neurons terminate in one or two anatomical structures called glomeruli located on the surface of the olfactory bulb (OB). Each glomerulus is innervated by several mitral and tufted cells (MTCs), which then project the odor information to several cortical regions. ‘

      (3) Line 78ff: The text sounds as if glomeruli are activated by the light stimulation but ChR2 is expressed in MTCs, the postsynaptic component of the glomeruli. It would be clearer to refer to the stimulation as light activation of MTCs.

      We corrected this sentence to: ‘We first mapped each recorded cell's receptive field, i.e., the set of MTCs on the dorsal OB that affect its firing rates when they are light-stimulated.’

      (4) Line 90: It would be great to mention somewhere in this paragraph that you are analyzing single-unit data sorted from extracellular recordings with tungsten electrodes.

      We added that to the description of the experimental setup: ‘To investigate how MTCs interact, we expressed the light-gated channel rhodopsin (ChR2) exclusively in MTCs by crossing the Tbet-Cre and Ai32 mouse lines (Grobman et al., 2018; Haddad et al., 2013), and extracellularly recorded the spiking activity of MTCs in anesthetized mice during optogenetic stimulation using tungsten electrodes.’

      (5) Line 97: The term "delta entrainment" could be easily confused with the entrainment of MTCs to respiration in the delta frequency band. Maybe better to use a different term or stick to "change in entrainment" also used in the text.

      We completely agree. The term was changed to “change in entrainment” throughout the manuscript and figures.

      (6) Line 121f: "Light stimulation did not affect ..." . Should this be "Paired light stimulation did not affect ..."?

      Corrected, thank you.

      (7) Supplementary Figure 1a: The example is not very convincing. It looks a bit like a rhythmic bursting neuron mildly depending on the stimulation.

      This panel serves to present our light stimulation method. The potency of the light stimulation protocol can be seen in the receptive field maps.

      (8) Supplementary Figure 1c: Why is there no confidence interval for 'Paired'?

      This panel shows the power spectrum density of the average neuron response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (9) Line 166f: "... across any light intensities". Maybe better "... for the four light intensities tested"?

      We agree, we changed the text in accordance.

      (10) Figure 2f: It would be more intuitive to have the x-axis in the same orientation as in 2e.

      Corrected, thank you.

      (11) Figure 4a: The image in this panel is identical to Figure 1a in Dalal and Haddad 2022 in Cell reports just with a different intensity. The reuse of items and data from previous publications should be indicated somewhere but I could not find it.

      We apologize for this replication. We replaced it with a photo showing a larger portion of the OB, demonstrating the restricted viral expression within the GCL.

      (12) Line 408ff: A brief explanation for the hypothesis of EPL parvalbumin interneurons as the ones mediating lateral inhibition would be great.

      We agree. We added the following paragraph to the discussion section: “We speculate that MTC-to-MTC suppression is mediated by EPL neurons, most likely the Parvalbumin neuron (PV). This hypothesis is based on their activity and connectivity properties with MTCs(Burton, 2017; Kato et al., 2013; Miyamichi et al., 2013; Burton, 2024). More studies are required to reveal how PV neurons affect MTC activity.”

      (13) Line 425ff: You show that only activity of high firing rate neurons is suppressed by lateral inhibition, whereas "low and noise MTC responses" are not affected. Wouldn't this rather support the conclusion that lateral inhibition prevents excess activity from the OB?

      We found lateral inhibition was mainly effective when the postsynaptic neurons fired at ~30-80Hz in response to light stimulation. That is, it affects MTC firing in this “intermediate” rate, and to a lesser extent when the MTC have low and very high firing rates. To prevent excess activity, one would expect a mechanism that affects more high firing rates than medium ones. This was demonstrated in Kato 2013 for PV-MTC inhibition

      (14) Line 387: "..., only ~20% of the tested MTC pairs exhibited significant lateral inhibition." This is higher than the 16% of neurons you reported to have lateral entrainment (line 100). Why do you consider the lateral inhibition as 'sparse' but the lateral entrainment as relevant?

      We apologize for this unclear statement. The papers we cited in this regard (Fantana et al., 2008; Lehmann et al., 2016; Pressler and Strowbridge, 2017) have tested lateral inhibition when the recorded MTC was not active, which resulted in a sparse MTC-MTC inhibition. We validated and replicated these findings in our setup, by systematically projecting light spots over the dorsal OB without simultaneous activation of the recorded MTC and found similar rates of largely scarce inhibition (data not shown). In this study, using spike-triggered average light stimulation protocol and paired activation of MTCs, we found higher rates of lateral inhibition, consistent with the reports by Isaacson and Strowbridge, 1998, Urban and Sakmann, 2002. We changed this paragraph to the following:

      “We found that in only ~20% of the tested MTC pairs exhibited significant lateral suppression. This rate is consistent with previous in-vitro studies that found lateral suppression between 10-20% of heterotypic MTC pairs (Isaacson and Strowbridge, 1998; Urban and Sakmann, 2002), and is higher compared to a case where the recorded MTC is not active (Lehmann et al., 2016).”

      Reviewer #2 (Recommendations For The Authors):

      Figure-by-figure comments:

      (1) Figures 1d,e: both these examples seem to show that the firing rate is decreased in the paired condition? From maxima at 110 to 58 Hz in d and 100 to 48 Hz in e. Please explain (see also comment on Figure S1c).

      Please see the response in the Public Review section, reviewer #2, bullet (2). We also added a panel to Supplementary Figure 1 to better explain this.

      (2) Figure 1 f The means and SEMs are hard to see. Why is the SEM bar plotted horizontally? Since this is a major finding of the paper, will there be a table provided that shows the distribution of ∆ shifts across animals?

      We apologize for the mistake. The horizontal bar was the marking of the mean. Since the SEM is small, we corrected the graph for better visualization of the SEM.

      (3) Figure 1g Showing the running average of data where there is almost none or no data points (beyond 50 Hz) seems not ideal. Is the enhanced entrainment around 40Hz significant? Perhaps the moving average should be replaced by binned data with indicated n?

      We prefer to show all data points instead of binning the data so the reader can see it all. We agree that such a wide range on the x-axis is unnecessary. We shorten this graph only to include the firing rate range in which the data points ranged.

      (4) Figure 1h Impressive result!

      Thank you!

      (5) Figure S1a: since the authors show the respiratory pattern here and there obviously was no alignment of light stimulation with inspiration, was there any correlation between the respiratory phase and efficiency of light stimulation with respect to lateral interactions?

      This is an interesting idea. In Haddad et al., 2013, figure 7, the authors performed a similar analysis, and showed that optogenetic activation of MTCs had a more pronounced effect on firing rate in the respiration phases where the neuron was less firing. However, we haven’t quantified the impact of lateral interactions with respect to the respiration phase. That being said, the data will be publicly available to test this question.

      (6) Figure S1c: Here the shift towards a lower firing rate seems to be obvious (see comment in Figures 1 d and e). Please also show the plot for Figure 1e.

      This panel shows the power spectrum density of the average neuron's response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (7) Figure 2b: show the same plot also for pair 2? Why is it stated that there is no lateral suppression for lateral stimulation alone, if the MTC did not spike spontaneously in the first place and thus inhibition cannot be demonstrated?

      We use Figure 2b to demonstrate the effect of lateral inhibition, and in Figure 2c we detail the responses under each light intensity for both pairs. We think that showing the mean and SEM for one example is enough to give a sense of the effect, as in Figure 2c we show the average response across time together with significant assessment for each pair (panels without a p-value have no significant difference between the conditions).

      However, we agree with the comment on this specific example and therefore deleted this sentence. However, at the population level we found no inhibition when activating the lateral spots, regardless of their firing rates (shown in Supplementary Figure 2a).

      (8) Figure 2d: why is there no distance-dependent color coding for the significant data points? Or, alternatively, since the distance plot is shown in 2e, perhaps drop this information altogether? Again, the moving average is problematic.

      Distance-dependent color coding is applied to all data points in this panel. Significant data points are shown in full circles and have distance-dependent color coding, which is mainly restricted to the lower part of the distance scale (cold colors).

      We used a moving average to relate to the similar result reported in Arevian 2008.In Figure 2e, the actual distance for each data point is indicated on the x-axis.

      (9) Figure 2f: the diagonal averaging method seems to neglect a lot of the data in Figure S2b, why not use radial coordinates for averaging?

      Thank you for the great suggestion. We indeed performed radial coordinates for the averaging, and the results are more robust and better summarize the entire data.

      (10) Figure 3: These are interesting observations, but are there cumulative data on such types of pairs? Please describe and show, otherwise this can only be a supplemental observation. Regarding 3b was it always the lower light intensity that resulted in suppression and the higher in sync? Since Burton et al. 2024 have just shown that PVNs require very little input to fire!

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      (11) Figure 4: This Figure and the corresponding section should be entitled "Additional GC activation... ", otherwise it might be confusing for the reader. A loss of function manipulation (local GC silencing) would be also great to have! You did this in the previous paper, why not here? Raw LFP data are not shown. In Figure 4e the reported odor response firing rate ranges only up to 40Hz, but the example in g shows a much higher frequency. Is the maximum in 4e significant? (same issue as for Figure 1g).

      We changed the phrase to ‘optogenetic GCL neurons activation’. Unfortunately, we haven’t performed experiments where we suppress GC columns. In the previous paper, we suppressed the activity of all accessible GCs, which resulted in reduced spike synchronization to the OB gamma oscillations. Silencing only the GC column is, we think, unlikely to have a substantial effect, especially if the GCs have low activity (but this needs to be tested). Furthermore, we added examples of raw LFP data for odor stimulation and odor combined with GCL column activation (see Supplementary Figure 4a).

      The instantaneous firing rate is high (~80Hz), however the firing rate values we report in Figure 4e is the average within a window of 2 seconds (the odor duration is 1.5 seconds and we extend the window to account for responses with late return to baseline). The average firing rate of this example neuron in this window was 28Hz.

      (12) Fig 5: what does "proximal" mean - does this mean stimulation of the GCs below the recorded MTC, that might actually belong to the same glomerular unit?

      Yes, by “proximal” we mean the activation of the GC in the column of the recorded MTC. However, we decided that instead of coarsely dividing the data into proximal and distal optogenetic activation of GCL neurons, we will show the data continuously to show that GC had no significant effect on MTC odor-evoked firing rates regardless of their location (Figure 5d).

      A comment on the title:

      Please tone it down: "Ensemble synchronization" is a hypothesis at this point, not directly shown in the paper. Also, the paper does not show lateral interactions between odor-activated neurons.

      We agree and have rephrased it to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons ”

      (1) Figure 1a, 2a scale bar missing.

      Corrected, thank you.

      (2) Figure 1 c is the "rebound" in the lateral stim trace (green) real or not significant?

      The activity during this rebound is not significantly different than the baseline activity before light stimulation.

      (3) Figure 2b legend: "lateral alone" instead of lateral?

      We appreciate the suggestion. For simplicity, we will keep it as “lateral”.

      (4) Figure 2c: some of the data plots seem to be breaking off, e.g. the blue line in the bottom third one.

      This line breaking is due to the lack of spikes in this period. The PSTHs used in all analyses result from the convolution of the spike train with a Gaussian window with a standard deviation of 50ms.

      (5) Figure 2f: Why is the x axis flopped vs 2d,e?

      This panel was mistakenly plotted that way, and was corrected.

      Comments on the text:

      Abstract - we had indicated suggestions by strike-throughs and color which are lost in the online submission system, please compare with your original text:

      Information in the brain is represented by the activity of neuronal ensembles. These ensembles are adaptive and dynamic, formed and truncated based on the animal`s experience. One mechanism by which spatially distributed neurons form an ensemble is via synchronization of their spiking activity in response to a sensory event. In the olfactory bulb, odor stimulation evokes rhythmic gamma activity in spatially distributed mitral and tufted cells (MTCs). This rhythmic activity is thought to enhance the relay of odor information to the downstream olfactory targets. However, how only specifically the odor-activated MTCs are synchronized is unknown. Here, we demonstrate that light optogenetic activation of activating one set of MTCs can gamma-entrain the spiking activity of another set. This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, facilitating the synchronization of only the odor-activated MTCs. Furthermore, we show that lateral synchronization did not depend on the distance between the MTCs and is mediated by granule cells. In contrast, lateral inhibition between MTCs that reduced their firing rates was spatially restricted to adjacent MTCs and was not mediated by granule cells. Our findings reveal lead us to propose ? a simple yet robust mechanism by which spatially distributed neurons entrain each other's spiking activity to form an ensemble.

      Thank you. We adopted most of the changes and edited the abstract to reflect the reported results better.

      "both MTCs fired at the gamma rhythm"/this is at this point unwarranted since the mutual entrainment is not shown - tone down or present as hypothesis?

      We completely agree. This sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm, facilitating the synchronization of the active MTC”.

      l. 28: distance-independent instead of "spatially independent"?

      Corrected

      l. 46: are there inhibitory neurons in the ONL? Or which 6 layers are you referring to here?

      Corrected to “spanning all OB layers”.

      l. 49: "is mediated" => "likely to be mediated". Schoppa's work is in vitro and did not account for PVNs, see comment in Public Review.

      Corrected. Indeed Schoppa`s work was performed in-vitro. We cite it here since it showed that the synchronized firing of two MTC pairs depends on granule cells.

      l.52: "method"? rather "mechanism"? "specifically" instread of "only"?

      Corrected.

      l.52: perhaps more precise: a recent hypothesis is that GCs enable synchronization solely between odor-activated MTCs via an activity-dependent mechanism for GABA-release (Lage Rupprecht et al. 2020 - please cite the experimental paper here). Again. Galan has no direct evidence for GCs vs PVNs, see comment in Public Review.

      Thank you, we updated this sentence here and in the discussion and added the relevant citation.

      l. 66: spike timings instead of spike's timing?

      Corrected to spike timings

      l. 67 -71: this part could be dropped.

      We appreciate the suggestion; however, we think that it is convenient to briefly read the main results before the results section.

      l. 76 mouse instead of mice.

      Corrected.

      l. 77: for clarification: " a single MTC"?

      In some cases, we recorded more than one cell simultaneously.

      l. 89: just use "hotspot".

      Corrected

      l. 97 instead of "change", "positive change" or "increase"?

      We left the word change, since we wanted to report that the change between hotspot alone and paired stimulation was significantly higher than zero.

      l. 104: the postsyn MTC's firing rate.

      Corrected to MTC instead of MTCs

      l.108: "distributed on the OB surface" sounds misleading, perhaps "across the glomerular map"?

      Corrected.

      l. 254: "which the MTCs form with each other"- perhaps "which interconnect MTCs".

      Corrected.

      l. 270 Additional GC activation.

      Corrected to ‘optogenetic activation of GCL neurons’

      l. 284 somewhat unclear - please expand.

      Corrected to ‘This measure minimizes the bias of the neuron's firing rate on the spike-LFP synchrony value’.

      l. 371: no odors in Schoppa et al.

      Corrected to ‘It has been shown that two active MTCs can synchronize their stimulus-evoked and odor-evoked spike timings’

      l. 406 ff. good point - but where is the transition? How does this observation rule out that GCs can mediate lateral suppression?

      It is an important question. We tested two setups of GCs optogenetic activation, either column activation (in this paper) or the activation of all accessible GCs of the dorsal OB (Dalal & Haddad, 2022). Although the latter manipulation results in significant firing rate suppression, the effect of MTC suppression was relatively small in anesthetized mice and even smaller in awake mice. Optogenetically activating GCs at baseline conditions resulted in a strong suppression of only the adjacent MTCs. Taken together, we think that GCs are capable of strongly inhibit MTCs, but it is not their main function in natural olfactory sensation.

      l. 422 ff: again, this is a hypothesis, please frame accordingly.

      Corrected to ‘Activity-dependent synchronization can enables the synchronization of odor-activated MTCs that are dispersed across the glomerular map’

      l. 551 typo.

      Corrected.

      l 556 ff: Figure 2 does not show odor responses.

      Corrected.

      l 582: Mix up of above/below and low/high?

      Corrected to ‘The values in the STA map that were above or below these high and low percentile thresholds’

      Reviewer #3 (Recommendations For The Authors):

      Line 76: "Ai39" should be corrected to "Ai32".

      Corrected. Thank you.

      Figure Legends: The legends should describe the results rather than interpret the data. For instance, the legends for Figures 1f, g, and h contain interpretations. The authors should review all legends and revise them accordingly.

      We appreciate the comment. However, we kindly disagree. We don’t see these opening sentences as interpretations but as guidance to the reader. For example, ‘Paired stimulation increases spikes’ temporal precision’ is not an interpretation; instead, it describes the finding presented in this panel. We think that legends that only repeat what can already be deduced from the graph are not helpful and, in many cases, obsolete. Explaining what we think this graph shows is common, and we prefer it as it helps the reader.

      For Figures 1d and e, it may be beneficial to add the spectrograms for the second stimulation alone.

      We show the stimulation of the hotspot alone and when we stimulate both.<br /> The spectrogram of the lateral alone does not show anything of importance.

      Figures 1a and 2a: Please add color bars so that readers can understand the meaning of the colors plotted.

      Color bars were added.

      Figure 3: The purpose of this figure is unclear. Why does the baseline firing rate for the paired activation differ? Is this an isolated observation, or is it observed in other units as well?

      This issue has been raised also by reviewer #2. Attached here is our response to reviewer #2

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      Figures 4 and 5 data seems to come from the same dataset as in Dalal and Haddad (2022) DOI: https://doi.org/10.1016/j.celrep.2022.110693. For example, the fluorescence image looks identical. If this is the case, the authors may want to state that that the image and and some of the data and analyses are reproduced.

      The recorded data shown in these figures are not reproduced from Dalal & Haddad 2022. We collected this data, using GC-columns activation instead of light activating the entire OB dorsal surface as was done in the 2022 paper.

      However, the histology image is the same and we now replaced it with a new image, which shows that the expression is restricted to the GCL.

      Figure 4d: the authors use the data plotted here to argue that the gamma entrainment is distance-independent. But there is a clear decrease over distance (e.g., delta PPC1 over 0.01 is not seen for distance beyond 1000 m). The claim of distance independence may be an over-interpretation of the data. Peace et al. (2024) also claimed that coupling via gamma oscillations occurs over a large spatial extent.

      From a statistical point of view, we can’t state that there is a dependency on distance as the correlation is insignificant (P = 0.86). PPC1 of value 0.01 can be found at 0, 500, and 700 microns. Lower values are found at far distances, but this can result from a smaller number of points. The reduced level of synchrony observed at distances above one mm could be the result of the reduced density of lateral interactions at these distances. That said, we rephrase the sentence to a more careful statement. Please see the rephrased sentence at the Public review section.

    1. Author response:

      We appreciate Reviewer 1’s observation that our findings (i.e., separable dynamic trajectories are systematically translated in response to whether outcomes are rewarded, and this translation is accumulated across trials) are consistent with a line attractor model. We agree with this assessment and, in the revised manuscript, will reframe our findings about the dynamic trajectories to address its consistency with a line attractor.

      However, we would like to emphasize that a line attractor model does not account for the dynamic nature of reversal probability activity observed in the neural data. Line attractor, regardless of whether it is curved or straight, implies that the activity is fixed when no reward information is presented. The focus of our work is to highlight this dynamic nature of reversal probability activity and its incompatibility with the line attractor model.

      This leads to the question of how we could reconcile the line attractor-like properties and the dynamic nature of reversal probability activity. In the revised manuscript, we will provide evidence for an augmented model that has an attractor state at the beginning of each trial, followed by dynamic activity during the trial. Such a model is an example of superposition of initial attractor states with fast within-trial dynamics, as pointed out by Reviewer 1.

      We also thank Reviewer 2 and Reviewer 3 for their comments on how the manuscript could be improved. In the revised manuscript, we will provide detailed explanations to clarify the choice of network model, data analysis methods and experiment and model setups.

      In addition, we would like to take this opportunity to point out potentially misleading statements in the reviews by Reviewer 2 and Reviewer 3. Reviewer 2 stated that “no action is required to be performed by neurons in the RNN, …, no intervening behavior is thus performed by neurons”. Reviewer 3 stated that “the RNN does not have to do any explicit computation during the non-feedback parts of the trial…”. These statements convey the message that the trained RNN does not perform any computation. In fact, the RNN is trained to make a choice during non-feedback period in response to feedback. This is the (and the only) computation RNN performs. “Intervening behavior” refers to the choice the RNN makes across trials until reversing its initially preferred choice. We think that this confusion might have happened because the meaning of the term “intervening behavior” was unclear. We will clarify this point in the revised manuscript.

      Again, thank you for the insightful comments. We will provide a more detailed response to the reviews and revise the manuscript accordingly.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It is established that srrm4 mutants have no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022).

      We chose not to generate and characterize the behavior and brain activity of srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by srrm4, making the characterization of srrm4 mutants disconnected from our overarching project goal.

      We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes.

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We acknowledge that unexpected changes to the mRNA could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we will clarify our generated mutant alleles by adding a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we will experimentally determine whether the mRNA was modified as expected for a subset of mutants with phenotypes.

      Third, we will further emphasize in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous and ongoing work. We currently show one mutant, tcf7l2, which we consider to have strong neural phenotypes, and we will expand this comparison in the revision. In our study of 132 genes linked to schizophrenia (Thyme et al., 2019), we established a signal cut-off for whether a mutant would be designated as having a neural phenotype, and we classify this set of microexon mutants in this context. Far stronger phenotypes are expected of loss-of-function alleles for microexon-containing genes, as we showed in Figure S1 of this manuscript in addition to our published work.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      We had already included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 of this manuscript. We will make the conclusions and data in this figure more obvious in the main text.

      Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) describes mild behavioral phenotypes for a microexon removal for kif1b, and we already show developmental abnormalities for the kif1b loss-of-function allele (Figure S1).

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We will modify our manuscript to include a discussion of these mutants.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      See response to point 2. We will experimentally determine whether the mRNA was modified as expected for a subset of mutants with phenotypes.

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by srrm4. We will clarify the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as srrm3 (Ciampi et al., 2022). It is well-known that loss of srrm3 has stronger impacts on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators, but instead was to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 3.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to point 2.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we will add a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We will experimentally determine whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point 2 response to reviewer 1). We also have already compared the microexon removal to a loss-of-function mutant for two lines (Figure S1), and we will make that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point 3 response to reviewer 1).

      In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we will add a figure that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      We will experimentally determine whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point 2 response to reviewer 1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noticed the inappropriate claim and we have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in abstract and in results.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype"  or "abnormal phenotype" in the abstract and results to avoid such claim.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, similar to findings in male mice (PMID: 33289482), and we have noted this in the Introduction. We apologize for not conducting further experiments to validate this point.

      Additionally, we have revised the description of the phenotype of senescent CRH neurons to “stressed phenotype” without carrying out further experiments to confirm the senescent phenotype. To provide more clarity on the performance of glial cells during treatment, we have included additional enrichment analysis data of differentially expressed genes among young (Y), old (O), and old treated (O.T) microglia and astrocytes in Figure 2—figure supplement 3. Notably, the behavior of microglia contrasts with that of total neurons concerning synapse-related cellular processes. We apologize for being unable to include female and young controls in this study.

      Reviewer #2 (Recommendations For The Authors)

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We have changed the description to “Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors within all the neurons”.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse".

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was" .

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise.

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      These are not conclusions but rather intriguing topics for discussion. Given the role of 17alpha-estradiol in promoting testosterone and reducing estradiol levels in males, we believe it is worthwhile to explore the potential application of 17alpha-estradiol in increasing testosterone levels in aged males, particularly those with hypogonadism.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. Therefore, we have chosen not to make changes to this section.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. Author response:

      eLife Assessment<br /> This valuable study suggests that Naa10, an N-α-acetyltransferase with known mutations that disrupt neurodevelopment, acetylates Btbd3, which has been implicated in neurite outgrowth and obsessive-compulsive disorder, in a manner that regulates F-actin dynamics to facilitate neurite outgrowth. While the study provides promising insights and biochemical, co-immunoprecipitation, and proteomic data that enhance our understanding of protein N-acetylation in neuronal development, the evidence supporting larger claims is incomplete. Nonetheless, the implications of these findings are noteworthy, particularly regarding neurodevelopmental and psychiatric conditions tied to altered expression of Naa10 or Btbd3.

      Thank you very much for recognizing our study, carefully reviewing our work, and providing insightful comments and constructive criticism!

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript examines the role of Naa10 in cKO animals, in immortalized neurons, and in primary neurons. Given that Naa10 mutations in humans produce defects in nervous system function, the authors used various strategies to try to find a relevant neuronal phenotype and its potential molecular mechanism.

      This work contains valuable findings that suggest that the depletion of Naa10 from CA1 neurons in mice exacerbates anxiety-like behaviors. Using neuronal-derived cell lines authors establish a link between N-acetylase activity, Btbd3 binding to CapZb, and F-actin, ultimately impinging on neurite extension. The evidence demonstrating this is in most cases incomplete, since some key controls are missing and clearly described or simply because claims are not supported by the data. The manuscript also contains biochemical, co-immunoprecipitation, and proteomic data that will certainly be of value to our knowledge of the effects of protein N--acetylation in neuronal development and function.

      Thanks! It would be appreciated if the Reviewer could point out in the public review which experiment lacks a control group.

      Reviewer #2 (Public review):

      In this study, the authors sought to elucidate the neural mechanisms underlying the role of Naa10 in neurodevelopmental disruptions with a focus on its role in the hippocampus. The authors use an impressive array of techniques to identify a chain of events that occurs in the signaling pathway starting from Naa10 acetylating Btbd3 to regulation of F-actin dynamics that are fundamental to neurite outgrowth. They provide convincing evidence that Naa10 acetylates Btbd3, that Btbd3 facilitates CapZb binding to F-actin in a Naa10 acetylation-dependent manner, and that this CapZb binding to F-actin is key to neurite outgrowth. Besides establishing this signaling pathway, the authors contribute novel lists of Naa10 and Btbd3 interacting partners, which will be useful for future investigations into other mechanisms of action of Naa10 or Btbd3 through alternative cell signaling pathways.

      Thank you very much for recognizing our study!

      The evidence presented for an anxiety-like behavioral phenotype as a result of Naa10 dysfunction is mixed and tenuous, and assays for the primary behaviors known to be altered by Naa10 mutations in humans were not tested. As such, behavioral findings and their translational implications should be interpreted with caution.

      (1) For the anxiety-like behavioral phenotype, we provided a paragraph titled “Naa10 and stress-induced anxiety” in the Discussion section of the text: “Our investigations revealed that hippocampal CA1-KO of Naa10 did not exhibit significant differences in the open field test (Figure S1K) but led to anxiety-like behavior in mice in the elevated plus maze (EPM) test (Figure 1A). This disparity might be attributed to the specific design of the EPM test, which is tailored to elicit a conflict between an animal's inclination to explore and its fear of open spaces and elevated areas. This distinction implies that Naa10 might play a role in stress responses within the emotional regulation circuitry, particularly in navigating potentially threatening and anxiety-provoking environments.” The open field test offers a less challenging, open environment that primarily promotes exploratory behavior. We agree that additional assays, such as the light-dark box test, would be helpful in clarifying the issue.

      (2) We agree that the behavioral findings and their translational implications should be interpreted with caution. The primary neurological behaviors known to be altered by Naa10 mutations in humans include intellectual disability and autism-like syndrome with defective emotional control. These behaviors are influenced by many factors, including defects in the hippocampal CA1. Thus, we tested hippocampal CA1 Naa10-KO mice using the Y-maze, tail suspension test, open field test, and elevated plus maze (EPM). However, only the EPM results were affected, while the other tests showed no significant changes. It should be noted that our study employed a postnatal, CA1-specific Naa10 conditional knockout (cKO) model driven by Camk2a-Cre, which selectively depletes Naa10 from hippocampal CA1 neurons after birth. In contrast, Naa10 mutations in human patients involve global effects and impact multiple brain regions from the embryonic stage, leading to a broader spectrum of phenotypes. The limited disruption in our model likely explains the absence of learning and memory deficits and the incomplete recapitulation of the full range of patient phenotypes. Furthermore, Naa10 knockout may not produce the same effects as Naa10 mutations. Our current study is primarily intended to explore the physiological function of Naa10 in hippocampal function.

      (3) We will replace all instances of “anxiety behavior” with “anxiety-like behavior.”

      Finally, while not central to the main cell signaling pathway delineated, the characterization of brain region-specific and cell maturity of Naa10 expression patterns was presented in few to single animals and not quantified, and as such should also be interpreted with caution.

      We agree that we should provide additional Naa10 immunostaining data from more than three WT and hippocampal CA1 Naa10-KO mouse brains, as well as quantify data such as the silver staining and Light Sheet Fluorescence Microscopy results presented in Figures 1C and 1D, respectively. Nevertheless, the current report presents consistent results across different mice used for various assays. For example, Figures 1B-D, with three different assays, each demonstrate that Naa10-cKO reduces neurite complexity in vivo.

      On a broader level, these findings have implications for neurodevelopment and potentially, although not tested here, synaptic plasticity in adulthood, which means this novel pathway may be fundamental for brain health.

      Thank you very much again for recognizing our study!

      Summarized list of minor concerns

      (1) The early claims of the manuscript are supported by very small sample sizes (often 1-3) and/or lack of quantification, particularly in Figures S1 and 1.

      We agree that we should provide additional Naa10 immunostaining data from more than three WT and hippocampal CA1 Naa10-KO mouse brains, as well as quantify data such as the silver staining and Light Sheet Fluorescence Microscopy results presented in Figures 1C and 1D, respectively. Nevertheless, the current report presents consistent results across different mice used for various assays. For example, Figures 1B-D, with three different assays, each demonstrate that Naa10-cKO reduces neurite complexity in vivo.

      (2) Evidence is insufficient for CA1-specific knockdown of Naa10.

      The Camk2a-Cre mice used in this study were derived from Dr. Susumu Tonegawa’s laboratory. According to the referenced paper, this strain restricts Cre/loxP recombination to the forebrain, with particularly high efficiency in the hippocampal CA1. Consistently, our data show that Naa10 was almost completely absent in the CA1 but partially depleted in the DG of the Naa10-cKO mice (Figure S1F in the text). Similar results were observed in a different pair of

      (3) The relationship between the behaviors measured, which centered around mood, and Ogden syndrome, was not clear, and likely other behavioral measures would be more translationally relevant for this study. Furthermore, the evidence for an anxiety-like phenotype was mixed.

      (1) For the anxiety-like behavioral phenotype, we provided a paragraph titled “Naa10 and stress-induced anxiety” in the Discussion section of the text: “Our investigations revealed that hippocampal CA1-KO of Naa10 did not exhibit significant differences in the open field test (Figure S1K) but led to anxiety-like behavior in mice in the elevated plus maze (EPM) test (Figure 1A). This disparity might be attributed to the specific design of the EPM test, which is tailored to elicit a conflict between an animal's inclination to explore and its fear of open spaces and elevated areas. This distinction implies that Naa10 might play a role in stress responses within the emotional regulation circuitry, particularly in navigating potentially threatening and anxiety-provoking environments.” The open field test offers a less challenging, open environment that primarily promotes exploratory behavior. We agree that additional assays, such as the light-dark box test, would be helpful in clarifying the issue.

      (2) We agree that the behavioral findings and their translational implications should be interpreted with caution. The primary neurological behaviors known to be altered by Naa10 mutations in humans include intellectual disability and autism-like syndrome with defective emotional control. These behaviors are influenced by many factors, including defects in the hippocampal CA1. Thus, we tested hippocampal CA1 Naa10-KO mice using the Y-maze, tail suspension test, open field test, and elevated plus maze (EPM). However, only the EPM results were affected, while the other tests showed no significant changes. It should be noted that our study employed a postnatal, CA1-specific Naa10 conditional knockout (cKO) model driven by Camk2a-Cre, which selectively depletes Naa10 from hippocampal CA1 neurons after birth. In contrast, Naa10 mutations in human patients involve global effects and impact multiple brain regions from the embryonic stage, leading to a broader spectrum of phenotypes. The limited disruption in our model likely explains the absence of learning and memory deficits and the incomplete recapitulation of the full range of patient phenotypes. Furthermore, Naa10 knockout may not produce the same effects as Naa10 mutations. Our current study is primarily intended to explore the physiological function of Naa10 in hippocampal function.

      (3) We will replace all instances of “anxiety behavior” with “anxiety-like behavior.”

      (4) Btbd3 is characterized by the authors as an OCD risk gene, but its status as such is not well supported by the most recent, better-powered genome-wide association studies than the one that originally implicated Btbd3. However, there is evidence that Btbd3 expression, including selectively in the hippocampus, is implicated in OCD-relevant behaviors in mice.

      Thanks for clarifying the issue!

      (5) The reporting of the statistics lacks sufficient detail for the reader to deduce how experimental replicates were defined.

      We believe we have provided sufficient detail for readers to deduce how experimental replicates were defined in each corresponding figure legend. It would be appreciated if the Reviewer could point out which specific figures lack sufficient details.

    1. Author response:

      Reviewer #1:

      Summary:<br /> In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      In a previous paper, we identified 95 Drosophila chaperones (Raut et al., 2017). We request that “all 64 known Drosophila chaperones” be replaced with “64 out of 95 known Drosophila chaperones” to make it factually correct.

      Strengths:

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010) pmid: 20071510, demonstrated that pan-neuronal expression of TauV377M disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner.

      If the mushroom bodies are defective upon TauV377M expression does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We agree that the disruptive organization of the mushroom body may cause memory deficits upon hTauV337M expression and that expression of Pfdn5 or Pfdn6 could reverse the deficits. One possible mechanism by which overexpression of Pfdn5/6 could rescue the Tau-induced memory deficits may be due to the stabilization of microtubules in the mushroom bodies.

      Proposed revision: We will assess if Tau-induced mushroom body disruption can be rescued with the overexpression of Pfdn5 or Pfdn6.

      Weakness:

      What is unclear however is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      Our data suggests that Pfdn5 stabilizes neuronal microtubules by directly associating with it, and loss of Pfdn5 exacerbates Tau-phenotypes by destabilizing microtubules. However, as the reviewer notes, analysis of direct interaction between Pfdn5 and hTau<sup>V337M</sup> might provide further insights into the mechanism of Pfdn5 and Tau-aggregation.

      Proposed revision: We will perform colocalization analysis and coimmunoprecipitation to ask if Pfdn5 colocalizes and directly interacts with Tau.

      Does Pfdn5 loss exacerbate TauV377M phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion.

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wild-type Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is TauV377M hyperphosphorylated in the larvae?? What happens to TauV377M phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      Proposed revisions: We will overexpress Pfdn5 or Pfdn6 with hTau<sup>V337M</sup> and ask if microtubule disruption caused by hTau<sup>V337M</sup> is rescued. Further, we will analyze the phospho-Tau levels in controls and Pfdn5 mutant background.

      Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tau-expressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      Proposed revision: We will incorporate data about the effect of heterozygous mutation of Pfdn5 on the lethality and synaptic phenotypes associated with the hTau<sup>WT</sup> and hTau<sup>V337M</sup> in the revised manuscript.

      Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of TauV377M (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which if true begs for a distinct mechanistic approach.

      We thank the reviewer for suggesting an alternate model for the Pfdn5 function. We will perform the Western blot analysis to assess Tau<sup>WT</sup> and Tau<sup>V337M</sup> levels in the absence of Pfdn5 or animals coexpressing Tau and Pfdn5. We will incorporate these data and conclusions in the revised manuscript.

      Finally, the authors argue that TauV377M forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers and overexpression promotes the more benign large aggregates??

      We will perform the Tau solubility assay in control, in the absence of Pfdn5 or animals coexpressing Tau and Pfdn5. Moreover, we will also ask if the large Tau puncta formed in the absence of Pfdn5 are soluble oligomers or stable aggregates. We have found that the coexpression of Tau and Pfdn5 does not result in the formation of  Tau aggregates. We will incorporate these and other relevant data in the revised manuscript.

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and tau-mediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine age-dependent, neurodegeneration.

      We have consistently found the progression of eye degeneration in the population of animals expressing Tau<sup>V337M</sup>, measured as the number of fused ommatidia/total number of ommatidia, with age. A few other studies have also shown age-dependent progressive degeneration in Drosophila retinal axons or lamina (Iijima-Ando et al., 2012; Sakakibara et al., 2018). We appreciate other studies that have proposed hTau-induced eye degeneration as a developmental defect (Malmanche et al., 2017; Sakakibara et al., 2023).

      Proposed revision: a) We will analyze the age-dependent neurodegeneration in the adult brain to further support our main conclusion that Pfdn5 ameliorates hTauV337M-induced progressive neurodegeneration.

      b) We have used three independent Pfdn5 RNAi lines (the RNAi's target different regions of Pfdn5) – all of which enhance the Tau phenotypes. The knockdown of any of these RNAi lines with GMR-Gal4 does not give detectable eye phenotypes. We will include these data in the revised manuscript.

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      Proteome analysis of Alzheimer's brain tissue shows that the Pfdn5 level is reduced in patients (Askenazi et al., 2023; Tao et al., 2020). Moreover, the Pfdn5 expression level was found to be reduced in the blood samples from AD patients (Ji et al., 2022). Another study further validates the age-dependent reduction of Pfdn5 in the tauopathy transgenic murine model (Kadoyama et al., 2019). Together, these reports highlight a potential link between Pfdn5 levels and tauopathies. We will revise the manuscript to reflect these findings in more detail.

      References

      Askenazi, M., Kavanagh, T., Pires, G., Ueberheide, B., Wisniewski, T., and Drummond, E. (2023). Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14, 4466. 10.1038/s41467-023-40208-x.

      Iijima-Ando, K., Sekiya, M., Maruko-Otake, A., Ohtake, Y., Suzuki, E., Lu, B., and Iijima, K.M. (2012). Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8, e1002918. 10.1371/journal.pgen.1002918.

      Ji, W., An, K., Wang, C., and Wang, S. (2022). Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159, 38. 10.1186/s41065-022-00252-x.

      Kadoyama, K., Matsuura, K., Takano, M., Maekura, K., Inoue, Y., and Matsuyama, S. (2019). Changes in the expression of prefoldin subunit 5 depending on synaptic plasticity in the mouse hippocampus. Neurosci Lett 712, 134484. 10.1016/j.neulet.2019.134484.

      Malmanche, N., Dourlen, P., Gistelinck, M., Demiautte, F., Link, N., Dupont, C., Vanden Broeck, L., Werkmeister, E., Amouyel, P., Bongiovanni, A., et al. (2017). Developmental Expression of 4-Repeat-Tau Induces Neuronal Aneuploidy in Drosophila Tauopathy Models. Sci Rep 7, 40764. 10.1038/srep40764.

      Raut, S., Mallik, B., Parichha, A., Amrutha, V., Sahi, C., and Kumar, V. (2017). RNAi-Mediated Reverse Genetic Screen Identified Drosophila Chaperones Regulating Eye and Neuromuscular Junction Morphology. G3 (Bethesda) 7, 2023-2038. 10.1534/g3.117.041632.

      Sakakibara, Y., Sekiya, M., Fujisaki, N., Quan, X., and Iijima, K.M. (2018). Knockdown of wfs1, a fly homolog of Wolfram syndrome 1, in the nervous system increases susceptibility to age- and stress-induced neuronal dysfunction and degeneration in Drosophila. PLoS Genet 14, e1007196. 10.1371/journal.pgen.1007196.

      Sakakibara, Y., Yamashiro, R., Chikamatsu, S., Hirota, Y., Tsubokawa, Y., Nishijima, R., Takei, K., Sekiya, M., and Iijima, K.M. (2023). Drosophila Toll-9 is induced by aging and neurodegeneration to modulate stress signaling and its deficiency exacerbates tau-mediated neurodegeneration. iScience 26, 105968. 10.1016/j.isci.2023.105968.

      Tao, Y., Han, Y., Yu, L., Wang, Q., Leng, S.X., and Zhang, H. (2020). The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11, 233. 10.3389/fneur.2020.00233.

    1. Author response:

      We appreciate the constructive feedback from the reviewers and will work to address many of these concerns in a revised version.  Here, we provide initial responses to a few key points that the reviewers raised:

      (1) The reviewers rightly pointed out that it is very important to clearly define and explain what qualifies as metastatic potential to particular organs in our system.  We acknowledge the valuable contributions of animal models in metastatic cancer studies, but here we intentionally limited our scope to metastasis that had occurred within the human system only.  For example, we use data from cancer cells that model human organotropism from the breast to the lung, since the cells originated from infiltrative ductal carcinoma (human breast) but were collected from pleural effusions (human lung). We propose that in this case a comparison with a human lung cancer-derived cell line that was itself purified from a pleural effusion could reveal factors essential for lung metastasis, without adding the confounder of an animal microenvironment.  The MetMap Explorer contains valuable information, but the “metastatic potential of each cell line” is measured in a mouse environment.  Knowing that a particular cell line, which originated from a human lung metastasis, can further metastasize to other organs in a mouse does not necessarily mean that those cells could do so in humans.  The microenvironment responses to metastatic colonization can differ among species.  Further, the changes a cell needs to make to adapt to a new organ system in a mouse could be confounded by the changes needed to adapt to mouse conditions in general.  Finally, migration from a site of ectopic injection may not mimic migration from an initial tumor site.  We agree that the very best data would come from matched primary and metastatic tumors in the same human patient, but those data do not currently exist and generating them would require future work beyond the scope of this study.   In our revision, we will ensure that  we more clearly explain how and why we chose the cell lines we did and what the advantages and limitations of this choice are.

      (2) The reviewers are correct that our unsupervised Principal component analysis (PCA) does not precisely stratify cells according to epithelial-mesenchymal status.  In a high dimensional, complex system, it is expected than an unsupervised analysis such as this will not capture just one biological feature in the first principal component. Therefore, when we performed PCA on the compartmental organization profiles of different healthy and cancerous cell lines, instead of finding the largest variation (PC1) following exactly EMT state, it captured an ordering that includes influences from epithelial-mesenchymal state, disease condition, nuclear geometry, and other cellular properties.  However, it was striking that this completely unsupervised analysis did match previous annotations of EMT state so well (as seen in supp fig 1b).  Therefore, we conclude that the most prominent variations in A/B compartment signature strongly relate to EMT state.   In the revision, we will more clearly present the caveats of this interpretation.

      (3) Our decision to focus on A/B compartmentalization rather than TAD or loop structure in this analysis was intentional and biologically motivated, rather than solely being a reflection of data resolution.  Both compartments and topologically associated domains (TADs) are key parts of genome organization and disruption of these structures has the potential to alter downstream gene regulation, as shown by numerous studies. But, compartments have been found, more so than TADs, to be strongly associated with cell type and cell fate.  Therefore,  in this manuscript, we decided to focus only on the compartment organization changes between different healthy and cancerous cells as they are more likely to represent the stable alterations of the genome organization malignant transformations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors performed experimental evolution of MreB mutants that have a slow-growing round phenotype and studied the subsequent evolutionary trajectory using analysis tools from molecular biology. It was remarkable and interesting that they found that the original phenotype was not restored (most common in these studies) but that the round phenotype was maintained. 

      Strengths: 

      The finding that the round phenotype was maintained during evolution rather than that the original phenotype, rod-shaped cells, was recovered is interesting. The paper extensively investigates what happens during adaptation with various different techniques. Also, the extensive discussion of the findings at the end of the paper is well thought through and insighXul. 

      Weaknesses: 

      I find there are three general weaknesses: 

      (1) Although the paper states in the abstract that it emphasizes "new knowledge to be gained" it remains unclear what this concretely is. On page 4 they state 3 three research questions, these could be more extensively discussed in the abstract. Also, these questions read more like genetics questions while the paper is a lot about cell biological findings. 

      Thank you for drawing attention to the unnecessary and gratuitous nature of the last sentence of the Abstract. We are in agreement. It has been modified, and we have taken  advantage of additional word space to draw attention to the importance of the two competing (testable) hypotheses laid out in the Discussion. 

      As to new knowledge, please see the Results and particularly the Discussion. But beyond this, and as recognised by others, there is real value for cell biology in seeing how (and whether) selection can compensate for effects that are deleterious to fitness. The results will very o_en depart from those delivered from, for example, suppressor analyses, or bottom up engineering. 

      In the work recounted in our paper, we chose to focus – by way of proof-of principle – on the most commonly observed mutations, namely, those within pbp1A.  But beyond this gene, we detected mutations  in other components of the cell shape / division machinery whose connections are not yet understood and which are the focus of on-going investigation.  

      As to the three questions posed at the end of the Introduction, the first concerns whether selection can compensate for deleterious effects of deleting mreB (a question that pertains to evolutionary aspects); the second seeks understanding of genetic factors; the third aims to shed light on the genotype-to-phenotype map (which is where the cell biology comes into play).  Given space restrictions, we cannot see how we could usefully expand, let alone discuss, the three questions raised at the end of the Introduction in restrictive space available in the Abstract.   

      (2) It is not clear to me from the text what we already know about the restoration of MreB loss from suppressors studies (in the literature). Are there suppressor screens in the literature and which part of the findings is consistent with suppressor screens and which parts are new knowledge?  

      As stated in the Introduction, a previous study with B. subtilis (which harbours three MreB isoforms and where the isoform named “MreB” is essential for growth under normal conditions), suppressors of MreB lethality were found to occur in ponA, a class A penicillin binding protein (Kawai et al., 2009). This led to recognition that MreB plays a role in recruiting Pbp1A to the lateral cell wall. On the other hand, Patel et al. (2020) have shown that deletion of classA PBPs leads to an up-regulation of rod complex activity. Although there is a connection between rod complex and class A PBPs, a further study has shown that the two systems work semi-autonomously (Cho et al., 2016). 

      Our work confirms a connection between MreB and Pbp1A, and has shed new light on how this interaction is established by means of natural selection, which targets the integrity of cell wall. Indeed, the Rod complex and class A PBPs have complementary activities in the building of the cell wall with each of the two systems able to compensate for the other in order to maintain cell wall integrity. Please see the major part of the Discussion. In terms of specifics, the connection between mreB and pbp1A (shown by Kawai et al (2009)) is indirect because it is based on extragenic transposon insertions. In our study, the genetic connection is mechanistically demonstrated.  In addition, we capture that the evolutionary dynamics is rapid and we finally enriched understanding of the genotype-to-phenotype map.

      (3) The clarity of the figures, captions, and data quantification need to be improved.  

      Modifications have been implemented. Please see responses to specific queries listed below.

      Reviewer #2 (Public Review): 

      Yulo et al. show that deletion of MreB causes reduced fitness in P. fluorescens SBW25 and that this reduction in fitness may be primarily caused by alterations in cell volume. To understand the effect of cell volume on proliferation, they performed an evolution experiment through which they predominantly obtained mutations in pbp1A that decreased cell volume and increased viability. Furthermore, they provide evidence to propose that the pbp1A mutants may have decreased PG cross-linking which might have helped in restoring the fitness by rectifying the disorganised PG synthesis caused by the absence of MreB. Overall this is an interesting study. 

      Queries: 

      Do the small cells of mreB null background indeed have have no DNA? It is not apparent from the DAPI images presented in Supplementary Figure 17. A more detailed analysis will help to support this claim. 

      It is entirely possible that small cells have no DNA, because if cell division is aberrant then division can occur prior to DNA segregation resulting in cells with no DNA. It is clear from microscopic observation that both small and large cells do not divide. It is, however, true, that we are unable to state – given our measures of DNA content – that small cells have no DNA. We have made this clear on page 13, paragraph 2.

      What happens to viability and cell morphology when pbp1A is removed in the mreB null background? If it is actually a decrease in pbp1A activity that leads to the rescue, then pbp1A- mreB- cells should have better viability, reduced cell volume and organised PG synthesis. Especially as the PG cross-linking is almost at the same level as the T362 or D484 mutant.  

      Please see fitness data in Supp. Fig. 13. Fitness of ∆mreBpbp1A is no different to that caused by a point mutation. Cells remain round.  

      What is the status of PG cross-linking in ΔmreB Δpflu4921-4925 (Line 7)? 

      This was not analysed as the focus of this experiment was PBPs. A priori, there is no obvious reason to suspect that ∆4921-25 (which lacks oprD) would be affected in PBP activity.

      What is the morphology of the cells in Line 2 and Line 5? It may be interesting to see if PG cross-linking and cell wall synthesis is also altered in the cells from these lines. 

      The focus of investigation was restricted to L1, L4 and L7. Indeed, it would be interesting to look at the mutants harbouring mutations in :sZ, but this is beyond scope of the present investigation (but is on-going). The morphology of L2 and L5 are shown in Supp. Fig. 9.

      The data presented in 4B should be quantified with appropriate input controls. 

      Band intensity has now been quantified (see new Supp. Fig .20). The controls are SBW25, SBW25∆pbp1A, SBW25 ∆mreB and SBW25 ∆mreBpbp1A as explained in the paper.

      What are the statistical analyses used in 4A and what is the significance value? 

      Our oversight. These were reported in Supp. Fig. 19, but should also have been presented in Fig. 4A. Data are means of three biological replicates. The statistical tests are comparisons between each mutant and SBW25, and assessed by paired t-tests.  

      A more rigorous statistical analysis indicating the number of replicates should be done throughout. 

      We have checked and made additions where necessary and where previously lacking. In particular, details are provided in Fig. 1E, Fig. 4A and Fig. 4B. For Fig. 4C we have produced quantitative measures of heterogeneity in new cell wall insertion. These are reported in Supp. Fig. 21 (and referred to in the text and figure caption) and show that patterns of cell wall insertion in ∆mreB are highly heterogeneous.

      Reviewer #3 (Public Review): 

      This paper addresses an understudied problem in microbiology: the evolution of bacterial cell shape. Bacterial cells can take a range of forms, among the most common being rods and spheres. The consensus view is that rods are the ancestral form and spheres the derived form. The molecular machinery governing these different shapes is fairly well understood but the evolutionary drivers responsible for the transition between rods and spheres are not. Enter Yulo et al.'s work. The authors start by noting that deletion of a highly conserved gene called MreB in the Gram-negative bacterium Pseudomonas fluorescens reduces fitness but does not kill the cell (as happens in other species like E. coli and B. subtilis) and causes cells to become spherical rather than their normal rod shape. They then ask whether evolution for 1000 generations restores the rod shape of these cells when propagated in a rich, benign medium. 

      The answer is no. The evolved lineages recovered fitness by the end of the experiment, growing just as well as the unevolved rod-shaped ancestor, but remained spherical. The authors provide an impressively detailed investigation of the genetic and molecular changes that evolved. Their leading results are: 

      (1) The loss of fitness associated with MreB deletion causes high variation in cell volume among sibling cells a_er cell division. 

      (2) Fitness recovery is largely driven by a single, loss-of-function point mutation that evolves within the first ~250 generations that reduces the variability in cell volume among siblings. 

      (3) The main route to restoring fitness and reducing variability involves loss of function mutations causing a reduction of TPase and peptidoglycan cross-linking, leading to a disorganized cell wall architecture characteristic of spherical cells. 

      The inferences made in this paper are on the whole well supported by the data. The authors provide a uniquely comprehensive account of how a key genetic change leads to gains in fitness and the spectrum of phenotypes that are impacted and provide insight into the molecular mechanisms underlying models of cell shape. 

      Suggested improvements and clarifications include: 

      (1) A schematic of the molecular interactions governing cell wall formation could be useful in the introduction to help orient readers less familiar with the current state of knowledge and key molecular players. 

      We understand that this would be desirable, but there are numerous recent reviews with detailed schematics that we think the interested reader would be better consulting. These are referenced in the text.

      (2) More detail on the bioinformatics approaches to assembling genomes and identifying the key compensatory mutations are needed, particularly in the methods section. This whole subject remains something of an art, with many different tools used. Specifying these tools, and the parameter sesngs used, will improve transparency and reproducibility, should it be needed. 

      We overlooked providing this detail, which has now been corrected by provision of more information in the Materials and Methods. In short we used Breseq, the clonal option, with default parameters. Additional analyses were conducted using Genieous. The BreSeq output files are provided https://doi.org/10.17617/3.CU5SX1 (which include all read data).

      (3) Corrections for multiple comparisons should be used and reported whenever more than one construct or strain is compared to the common ancestor, as in Supplementary Figure 19A (relative PG density of different constructs versus the SBW25 ancestor). 

      The data presented in Supp Fig 19A (and Fig 4A) do not involve multiple comparisons. In each instance the comparison is between SBW25 and each of the different mutants. A paired t-test is thus appropriate.

      (4) The authors refrain from making strong claims about the nature of selection on cell shape, perhaps because their main interest is the molecular mechanisms responsible. However, I think more can be said on the evolutionary side, along two lines. First, they have good evidence that cell volume is a trait under strong stabilizing selection, with cells of intermediate volume having the highest fitness. This is notable because there are rather few examples of stabilizing selection where the underlying mechanisms responsible are so well characterized. Second, this paper succeeds in providing an explanation for how spherical cells can readily evolve from a rod-shaped ancestor but leaves open how rods evolved in the first place. Can the authors speculate as to how the complex, coordinated system leading to rods first evolved? Or why not all cells have lost rod shape and become spherical, if it is so easy to achieve? These are important evolutionary questions that remain unaddressed. The manuscript could be improved by at least flagging these as unanswered questions deserving of further attention. 

      These are interesting points, but our capacity to comment is entirely speculative. Nonetheless, we have added an additional paragraph to the Discussion that expresses an opinion that has yet to receive attention:

      “Given the complexity of the cell wall synthesis machinery that defines rod-shape in bacteria, it is hard to imagine how rods could have evolved prior to cocci. However, the cylindrical shape offers a number of advantages. For a given biomass (or cell volume), shape determines surface area of the cell envelope, which is the smallest surface area associated with the spherical shape. As shape sets the surface/volume ratio, it also determines the ratio between supply (proportional to the surface) and demand (proportional to cell volume). From this point of view, it is more efficient to be cylindrical (Young 2006). This also holds for surface attachment and biofilm formation (Young 2006). But above all, for growing cells, the ratio between supply and demand is constant in rod shaped bacteria, whereas it decreases for cocci. This requires that spherical cells evolve complex regulatory networks capable of maintaining the correct concentration of cellular proteins despite changes in surface/volume ratio. From this point of view, rod-shaped bacteria offer opportunities to develop unsophisticated regulatory networks.”

      why not all cells have lost rod shape and become spherical.

      Please see Kevin Young’s 2006 review on the adaptive significance of cell shape

      The value of this paper stems both from the insight it provides on the underlying molecular model for cell shape and from what it reveals about some key features of the evolutionary process. The paper, as it currently stands, provides more on which to chew for the molecular side than the evolutionary side. It provides valuable insights into the molecular architecture of how cells grow and what governs their shape. The evolutionary phenomena emphasized by the authors - the importance of loss-of-function mutations in driving rapid compensatory fitness gains and that multiple genetic and molecular routes to high fitness are o_en available, even in the relatively short time frame of a few hundred generations - are wellunderstood phenomena and so arguably of less broad interest. The more compelling evolutionary questions concern the nature and cause of stabilizing selection (in this case cell volume) and the evolution of complexity. The paper misses an opportunity to highlight the former and, while claiming to shed light on the latter, provides rather little useful insight. 

      Thank you for these thoughts and comments. However, we disagree that the experimental results are an overlooked opportunity to discuss stabilising selection. Stabilising selection occurs when selection favours a particular phenotype causing a reduction in underpinning population-level genetic diversity. This is not happening when selection acts on SBW25 ∆mreB leading to a restoration of fitness. Driving the response are biophysical factors, primarily the critical need to balance elongation rate with rate of septation. This occurs without any change in underlying genetic diversity.  

      Recommendations for the authors:  

      Reviewer 1 (Recommendations for the Authors): 

      Hereby my suggestion for improvement of the quantification of the data, the figures, and the text. 

      -  p 14, what is the unit of elongation rate?  

      At first mention we have made clear that the unit is given in minutes^-1

      -  p 14, please give an error bar for both p=0.85 and f=0.77, to be able to conclude they are different 

      Error on the probability p is estimated at the 95% confidence interval by the formula:1.96 , where N is the total number of cells. This has been added in the paragraph p »probability » of the Image Analysis section in the Material and Methods. 

      We also added errors on p measurement in the main text.

      -  p 14, all the % differences need an errorbar 

      The error bars and means are given in Fig 3C and 3D.

      -  Figure 1B adds units to compactness, and what does it represent? Is the cell size the estimated volume (that is mentioned in the caption)? Shouldn't the datapoints have error bars? 

      Compactness is defined in the “Image Analysis” section of the Material and Methods. It is a dimensionless parameter. The distribution of individual cell shapes / sizes are depicted in Fig 1B. Error does arise from segmentation, but the degree of variance (few pixels) is much smaller than the representations of individual cells shown.

      -  Figure 1C caption, are the 50.000 cells? 

      Correct. Figure caption has been altered.

      -  Figure 1D, first the elongation rate is described as a volume per minute, but now, looking at the units it is a rate, how is it normalized? 

      Elongation rate is explained in the Materials and Methods (see the image analysis section) and is not volume per minute. It is dV/dt = r*V (the unit of r is min^-1). Page 9 includes specific mention of the unit of r.

      -  Figure 1E, how many cells (n) per replicate? 

      Our apologies. We have corrected the figure caption that now reads:

      “Proportion of live cells in ancestral SBW25 (black bar) and ΔmreB (grey bar) based on LIVE/DEAD BacLight Bacterial Viability Kit protocol. Cells were pelleted at 2,000 x g for 2 minutes to preserve ΔmreB cell integrity. Error bars are means and standard deviation of three biological replicates (n>100).”

      -  Figure 1G, how does this compare to the wildtype 

      The volume for wild type SBW25 is 3.27µm^3 (within the “white zone”). This is mentioned in the text.

      -  Figure 2B, is this really volume, not size? And can you add microscopy images? 

      The x-axis is volume (see Materials and Methods, subsection image analysis). Images are available in Supp. Fig. 9.

      -  Figure 3A what does L1, L4 and L7 refer too? Is it correct that these same lines are picked for WT and delta_mreB 

      Thank you for pointing this out. This was an earlier nomenclature. It was shorthand for the mutants that are specified everywhere else by genotype and has now been corrected. 

      -  Figure 3c: either way write out p, so which probability, or you need a simple cartoon that is plotted. 

      The value p is the probability to proceed to the next generation and is explained in Materials and Methods  subsection image analysis.  We feel this is intuitive and does not require a cartoon. We nonetheless added a sentence to the Materials and Methods to aid clarity.

      -  Figure 4B can you add a ladder to the gel? 

      No ladder was included, but the controls provide all the necessary information. The band corresponding to PBP1A is defined by presence in SBW25, but absence in SBW25 ∆pbp1A.

      -  Figure 4c, can you improve the quantification of these images? How were these selected and how well do they represent the community? 

      We apologise for the lack of quantitative description for data presented in Fig 4C. This has now been corrected. In brief, we measured the intensity of fluorescent signal from between 10 and 14 cells and computed the mean and standard deviation of pixel intensity for each cell. To rule out possible artifacts associated with variation of the mean intensity, we calculated the ratio of the standard deviation divided by the square root of the mean. These data reveal heterogeneity in cell wall synthesis and provide strong statistical support for the claim that cell wall synthesis in ∆mreB is significantly more heterogeneous than the control. The data are provided in new Supp. Fig. 21. 

      Minor comments: 

      -  It would be interesting if the findings of this experimental evolution study could be related to comparative studies (if these have ever been executed).  

      Little is possible, but Hendrickson and Yulo published a portion of the originally posted preprint separately. We include a citation to that paper. 

      -  p 13, halfway through the page, the second paragraph lacks a conclusion, why do we care about DNA content? 

      It is a minor observation that was included by way of providing a complete description of cell phenotype.  

      -  p 17, "suggesting that ... loss-of-function", I do no not understand what this is based upon. 

      We show that the fitness of a pbp1A deletion is indistinguishable from the fitness of one of the pbp1A point mutants. This fact establishes that the point mutation had the same effects as a gene deletion thus supporting the claim that the point mutations identified during the course of the selection experiment decrease (or destroy) PBP1A function.

      -  p 25, at the top of the page: do you have a reference for the statement that a disorganized cell wall architecture is suited to the topology of spherical cells? 

      The statement is a conclusion that comes from our reasoning. It stems from the fact that it is impossible to entirely map the surface of a sphere with parallel strands.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1: 

      Summary:

      In this study, Avila et al. tested the hypothesis that chronic pain states are associated with changes in the excitability of the medial prefrontal cortex (mPFC). The authors used the slope of the aperiodic component of the EEG power spectrum (= the aperiodic exponent) as a novel, non-invasive proxy for the cortical excitation-inhibition ratio. They performed source localization to estimate the EEG signals generated specifically by the mPFC. By pooling resting-state EEG recordings from three existing datasets, the authors were able to compare the aperiodic exponent in the mPFC and across the whole brain (at all modeled cortical sources) between 149 chronic pain patients and 115 healthy controls. Additionally, they assessed the relationship between the aperiodic exponent and pain intensity reported by the patients. To account for heterogeneity in pain etiology, the analysis was also performed separately for two patient subgroups with different chronic pain conditions (chronic back pain and chronic widespread pain). The study found robust evidence against differences in the aperiodic exponent in the mPFC between people with chronic pain and healthy participants, and no correlation was observed between the aperiodic exponent and pain intensity. These findings were consistent across different patient subgroups and were corroborated by the whole-brain analysis.

      Strengths:

      The study is based on sound scientific reasoning and rigorously employs suitable methods to test the hypothesis. It follows a pre-registered protocol, which greatly increases the transparency and, consequently, the credibility of the reported results. In addition to the planned steps, the authors used a multiverse analysis to ensure the robustness of the results across different methodological choices. I find this particularly interesting, as the EEG aperiodic exponent has only recently been linked to network excitability, and the most appropriate methods for its extraction and analysis are still being determined. The methods are clearly and comprehensively described, making this paper very useful for researchers planning similar studies. The results are convincing, and supported by informative figures, and the lack of the expected difference in mPFC excitability between the tested groups is thoroughly and constructively discussed.

      We are grateful for the appreciation of the strengths of our study.  

      Weaknesses:

      Firstly, although I appreciate the relatively large sample size, pooling data recorded by different researchers using different experimental protocols inevitably increases sample variability and may limit the availability of certain measures, as was the case here with the reports of pain intensity in the patient group. Secondly, the analysis heavily relies on the estimation of cortical sources, an approach that offers many advantages but may yield imprecise results, especially when default conduction models, source models, and electrode coordinates are used. In my opinion, this point should be discussed as well.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We further agree on the limitations of source space analysis. Therefore, we have added these limitations to the discussion section.

      Reviewer #2: 

      Summary:

      This study evaluated the aperiodic component in the medial prefrontal cortex (mPFC) using restingstate EEG recordings from 149 individuals with chronic pain and 115 healthy participants. The findings showed no significant differences in the aperiodic component of the mPFC between the two groups, nor was there any correlation between the aperiodic component and pain intensity. These results were consistent across various chronic pain subtypes and were corroborated by whole-brain analyses. The study's robustness was further reinforced by preregistration and multiverse analyses, which accounted for a wide range of methodological choices.

      Strengths:

      This study was rigorously conducted, yielding clear and conclusive results. Furthermore, it adhered to stringent open and reproducible science practices, including preregistration, blinded data analysis, and Bayesian hypothesis testing. All data and code have been made openly available, underscoring the study's commitment to transparency and reproducibility.

      We appreciate the appraisal of the strengths of our study, highlighting our efforts in open and reproducible science practices.

      Weaknesses:

      The aperiodic exponent of the EEG power spectrum is often regarded as an indicator of the excitatory/inhibitory (E/I) balance. However, this measure may not be the most accurate or optimal for quantifying E/I balance, a limitation that the authors might consider addressing in the future.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      Recommendations for the authors

      Reviewer #1: 

      (1) In the Results section, it might be helpful to provide the mean values of the aperiodic exponent (before age correction) for all tested groups and subgroups. As this measure is still not widely used, providing these values would allow readers to better understand the normal range of the aperiodic exponent.

      We have added the mean values of the aperiodic exponent and their standard deviation (before age correction) to the manuscript's results section (page 6 and 11).

      (2) When reporting the aperiodic exponent across all cortical sources (Q3), I think it would be useful to include the raw values in Figure 6 in the main text rather than in the Supplementary Materials. At a glance, these plots seem to suggest that the aperiodic exponent differs between groups in the occipital and parietal regions, even though no tests were significant after correcting for multiple comparisons. Maybe this observation also deserves a mention in the text and possibly in the Discussion..?

      We have moved the report on the aperiodic exponent across all cortical sources from the Supplementary Material to the main text. It is now Fig. 7 of the main manuscript. Moreover, we agree that the plots suggest group differences in certain brain regions. However, according to our rigorous open and reproducible science practices and pre-registration, we prefer not to speculate on these non-significant findings. 

      (3) In the Methods section, when describing the participants, the authors state that "Gender was balanced across both groups...". It might be better to avoid referring to the datasets as "balanced," considering that the sample includes almost twice as many females as males.

      We have replaced the misleading statement with the more precise statement that ”the gender ratio of both groups was similar.”

      (4) In the Methods section, when describing the source localization, I find it slightly confusing that the authors first mention the anterior cingulate cortex as a possible label included in the mPFC cortical parcels but then state that the version of the cortical atlas used did not contain such a label. It might be simpler not to mention the cingulate cortex at all.

      We have deleted the misleading sentence from the manuscript.  

      Reviewer #2: 

      (1) The aperiodic exponent of the EEG power spectrum is often considered an indicator of the excitatory/inhibitory (E/I) balance, but this measure can be susceptible to artifacts. It is important to acknowledge this limitation and consider exploring alternative measures to quantify the E/I ratio in future studies.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      (2) The study assumed a linear relationship between the E/I ratio (represented by the aperiodic exponent of the EEG power spectrum) and chronic pain. However, this assumption may not hold true in all cases, and this point could be discussed in the study.

      We fully agree that the relationship between the E/I ratio and chronic pain might not be a linear one and have added this point to the discussion section.

      (3) The aperiodic component was characterized in eyes-closed resting-state EEG recordings, although EEG data were collected in both eyes-closed and eyes-open conditions. The authors could also consider assessing the aperiodic component from EEG data with eyes open.

      We thank the reviewer for this suggestion. We have focused our analysis on eyes-closed recordings since these recordings are usually less contaminated by artifacts than eyes-open recordings. Moreover, in our current datasets, some participants were missing eyes-open recordings. We agree that performing similar analyses for the eyes-open recordings would also be interesting. However, adding these analyses would double the amount of data included in the manuscript, which would likely overload it. We have, therefore, now included a statement to the discussion that future studies should also analyze eyes-open EEG recordings.  

      (4) The EEG power spectrum was calculated from signals after source reconstruction, a crucial step for targeting specific brain regions. However, this process can introduce potential signal distortions, such as variations in source waveforms depending on different regularization parameters. To ensure the robustness of the results, the authors could perform the same analysis at the sensor level, for example, using signals recorded at Fz.

      We agree on the potential shortcomings and limitations of source space analysis and have added this limitation to the discussion section.

      (5) It would be beneficial to present the raw EEG power spectrum averaged across subjects for each condition, along with the scalp distribution of the aperiodic exponent. This would enhance readers' understanding of the study and help demonstrate the quality of the data.

      We are grateful for this suggestion and added the power spectrum for each condition and the scalp distribution of the aperiodic exponent to the Supplementary Material.

      (6) Linear regression models were used to control for the influence of age on aperiodic exponents and pain intensity ratings. However, it is unclear why other relevant variables, such as gender and medication use, were not considered.

      We agree that the aperiodic exponent might be influenced by gender and medication. As these analyses had not been included in our pre-registered analysis plan, we have not performed them. Moreover, although we agree that gender might have an impact, we have not found any evidence for this so far. Regarding medication, we fully agree that medication can influence the measure. However, medication was very heterogeneous, including drugs with fundamentally different mechanisms of action. Thus, we do not see a robust way to appropriately analyze these effects with sufficient statistical power. We have now added this important point to the discussion section.

      (7) The authors may consider addressing or discussing the impact of inter-individual variability on the negative results, particularly given that the data were derived from multiple experiments.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We have added this limitation to the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable work advances our understanding of the foraging behaviour of aerial insectivorous birds. Its major strength is the large volume of tracking data and the accuracy of those data. However, the evidence supporting the main claim of optimal foraging is incomplete.

      We deeply appreciate the thoughtful review provided by the reviewers, including their valuable insights and meticulous attention to detail. Each comment has been thoroughly evaluated, leading to substantial improvements in the manuscript. Your constructive critique has been instrumental in refining our research and rectifying any oversights. We are confident that the revised article will make a substantial contribution to ecological research, particularly in advancing our understanding of foraging theories and the behaviors of aerial insectivores.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study tests whether Little Swifts exhibit optimal foraging, which the data seem to indicate is the case. This is unsurprising as most animals would be expected to optimize the energy income: expenditure ratio; however, it hasn't been explicitly quantified before the way it was in this manuscript.

      The major strength of this work is the sheer volume of tracking data and the accuracy of those data. The ATLAS tracking system really enhanced this study and allowed for pinpoint monitoring of the tracked birds. These data could be used to ask and answer many questions beyond just the one tested here.

      The major weakness of this work lies in the sampling of insect prey abundance at a single point on the landscape, 6.5 km from the colony. This sampling then requires the authors to work under the assumption that prey abundance is simultaneously even across the study region - an assumption that is certainly untrue. The authors recognize this problem and say that sampling in a spatially explicit way was beyond their scope, which I understand, but then at other times try to present this assumption as not being a problem, which it very much is.

      Further, it is uncertain whether other aspects of the prey data are problematic. For example, the radar only samples insects at 50 m or higher from the ground - how often do Little Swifts forage under 50 m high?

      Another example might be that the phrases "high abundance" and "low abundance" are often used in the manuscript, but never defined.

      It may be fair to say that prey populations might be correlated over space but are not equal. It is this unknown degree of spatial correlation that lends confidence to the findings in the Results. As such, the finding that Little Swifts forage optimally is indeed supported by the data, notwithstanding some of the shortcomings in the prey abundance data. The authors achieved their aims and the results support their conclusions.

      Thanks for this comment.

      The basic assumption of this paper is that the abundance of insects bioflow in the airspace is correlated in space and varies over time. This has been demonstrated by different studies, see for example Bell et al. (Bell, J. R., Aralimarad, P., Lim, K. S., & Chapman, J. W. (2013). Predicting insect migration density and speed in the daytime convective boundary layer. PloS one, 8(1), e54202) in which positive correlation in insect bioflow is demonstrated between different sites that are more than 100 km away in Southern England. Given the much closer proximity of the colony and the radar site, as well as the large foraging distance of the swifts that often forage in the vicinity of the radar and beyond it, it is reasonable to assume that the radar was able to successfully capture between-day variation in the abundance of flying insects in the airspace, which is highly relevant for the foraging swifts. This is likely because meteorological variables such as temperature and wind, which tend to vary over a synoptic-system scale of several hundred kilometers, significantly influence the abundance of aerial insects. Furthermore, the direction of insect flight that has been recorded by the radar points to an overall south-north directionality of the insects during the period of the study (Werber et al. Under Review: Werber, Y., Chapman, J. W., Reynolds, D. R. and Sapir, N. Active navigation and meteorological selectivity drive patterns of mass intercontinental insect migration through the Levant). Hence, it is reasonable to assume that since the colony is positioned approximately 6.5 km south of the radar site, the radar is able to reliable estimate the between-day variation in aerial insect abundance experienced by the foraging swifts. Importantly, this between-day variation is very high, and detailed information regarding this variation is provided in the paper.  We thank the reviewer for the comments on the wording and have corrected it accordingly so that it is explicitly stated that the spatial distribution of the flying insects is indeed not uniform, but is expected to be simultaneously affected by environmental variables creating spatially correlated bioflow of aerial insects.

      The term "high abundance" or "low abundance" is relative to the variable being examined but throughout the manuscript we did not use these terms to describe an absolute amount or a certain threshold but rather to describe the ecological circumstances experienced by the birds on different days that substantially varied in abundance of insect recorded by the radar. However, we have improved the wording of the text so that it is now clear that we refer to relative  and not to absolute values.

      At its centre, this work adds to our understanding of Little Swift foraging and extends to a greater understanding of aerial insectivores in general. While unsurprising that Little Swifts act as optimal foragers, it is good to have quantified this and show that the population declines observed in so many aerial insectivores are not necessarily a function of inflexible foraging habits. Further, the methods used in this research have great potential for other work. For example, the ATLAS system poses some real advantages and an exciting challenge to existing systems, like MOTUS. The radar that was used to quantify prey abundance also presents exciting possibilities if multiple units could be deployed to get a more spatially-explicit view.

      To improve the context of this work, it is worth noting that the authors suggest that this work is important because it has never been done before for an aerial insectivore; however, that justification is untrue as it has been assessed in several flycatcher and swallow species. A further justification is that this research is needed due to dramatic insect population declines, but the magnitude and extent of such declines are fiercely debated in the literature. Perhaps these justifications are unnecessary, and the work can more simply be couched as just a test of optimality theory.

      We appreciate the reviewer's helpful comment. A flycatcher is indeed an aerial insect eater, but its foraging strategy is very different from that of swifts. A comparison with the foraging strategy of the swallow is much more relevant. However, the methods used to quantify bird movement in the airspace in previous articles limited the ability to examine the optimal foraging theory in detail. Following the comment, we revised the text to better describe the uniqueness of our research. Further, since we studied insectivores, it is important to provide a broad context to potentially significant threats to the birds, albeit being debatable

      Reviewer #2 (Public Review):

      Summary:

      Bloch et al. investigate the relationships between aerial foragers (little swifts) tracked with an automated radio-telemetry system (Atlas) and their prey (flying insects) monitored with a small-scale vertical-looking radar device (BirdScan MR1). The aim of the study was to test whether little swifts optimise their foraging with the abundance of their prey. However, the results provided little evidence of optimal foraging behaviour.

      Strengths:

      This study addresses fundamental knowledge gaps on the prey-predator dynamics in the airspace. It describes the coincidence between the abundance of flying insects and features derived from tracking individual swifts.

      Weaknesses:

      The article uses hypotheses broadly derived from optimal foraging theory, but mixes the form of natural selection: parental energetics, parental survival (predation risks), nestling foraging, and breeding success.

      While this study explores additional behavioral theories alongside optimal foraging theory, its findings unequivocally support the latter. The highly statistically significant observed reduction in flight distance from the breeding colony in elation to increasing insect abundance (supporting predictions 1 and 2) coupled with an increased rate of colony visits (supporting prediction 5) demonstrate the Little Swifts' adeptness at optimizing their aerial foraging behavior. This behavior manifests in an enhanced frequency of visits to the breeding colony, underscoring their food provisioning maximization.

      Results are partly incoherent (e.g., "Thus, even when the birds foraged close to the colony under optimal conditions, the shorter traveling distance is not thought to not confer lower flight-related energetic expenditure because more return trips were made.", L285-287),

      Thanks for the comment. We have corrected this sentence.

      and confounding factors (e.g., brooding vs. nestling phase) are ignored.

      The breeding stage may indeed affect food provisioning properties but this factor is not confounded since insect abundance, and the consequent changes in bird foraging properties, fluctuated between sequential days while brooding and nestling phases take place over a period of several weeks, each. Further, despite the possible influence of breeding stages on bird behavior, variability in reproductive stages is expected among pairs in a breeding colony occupying dozens of pairs, despite some coordination in nesting initiation. Practically, the narrow and concealed nest openings hindered direct observation of the nests, posing challenges in determining the precise reproductive stage of each pair. Anyway, we added a short description of the dense colony structure to the Methods section.

      Some limits are clearly recognised by the authors (L329 and ff).

      See above the response about the distribution of insects in space.

      To illustrate potential confounding effects, the daily flight duration (Prediction 4) should decrease with prey abundance, but how far does the daily flight duration coincide with departure and arrival at sunrise and sunset (note that day length increases between March and May), respectively, and how much do parents vary in the duration of nest attendance during the day across chick ages?

      We added the following explanation to the Methods section:

      To standardize the effect of day length on daily foraging duration, we calculated and subtracted the day length from the total daily foraging time (Day duration - Daily foraging duration = Net foraging duration). The resulting data represent the daily foraging duration in relation to sunrise and sunset, independent of day length.

      To conclude, insufficient analyses are performed to rigorously assess whether little swifts optimize their foraging.

      We disagree. See our responses above.

      Filters applied on tracking data are necessary but may strongly influence derived features based on maximum or mean values. Providing sensitivity tests or using features less dependent on extreme values may provide more robust results.

      Thank you for highlighting the importance of considering the impact of data filtering on derived features. In our analysis, we employed rigorous filtering methods to emphasize central data tendencies while mitigating the influence of extreme values. These methods, validated through consultation with experts in tracking data analysis, follow established practices in the literature. Detailed descriptions of our filtering procedures can be found in the Methods section, with citations to relevant published studies.

      Radar insect monitoring is incomplete and strongly size-dependent. What is the favourite prey size of swifts? How does it match with BirdScan MR1 monitoring capability?

      We added an explanation to the Methods section to address this comment:

      The Radar Cross Section (RCS) quantifies the reflectivity of a target, serving as a proxy for size by representing the cross-sectional area of a sphere with identical reflectivity to water, whose diameter equals the target's body length. Recent findings indicate that the BirdScan MR1 radar can detect insects with an RCS as low as 3 mm², enabling the detection of insects with body lengths as small as 2 mm. These capabilities make the radar suitable for locating the primary prey of swifts, which typically range in size from 1 to 16 mm.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 53-59 - major run-on sentence

      Thanks for the comment. Done.

      Line 133 - describe better. Attached where? Were feathers clipped or removed?

      Thanks for the comment. Done.

      Line 153 - shouldn't be a new paragraph

      Done.

      Line 157 - justify choosing four 

      To ensure a robust analysis of swifts' behavior relative to food abundance across multiple individuals simultaneously, we opted to exclude data from instances where only 3 tags were active. This decision was motivated by the fact that these instances accounted for only 2.9% of the data, and their exclusion minimally impacted overall data volume while enhancing data quality. In contrast, instances with 4 tags, comprising 16.2% of the data, provided substantial insights. Omitting these instances would have resulted in significant data loss. Thus, setting a threshold of 4 simultaneous tags represents a balance between maintaining adequate data quantity and ensuring high data quality for meaningful analysis.

      It took me a long time to determine whether the average and maximum flight distance was actual or Euclidean. It was only in the Results that I grasped it was actual. Define up front in the Methods.

      Thanks for the comment. Done.

      In my public review, I mention that optimal foraging has been assessed in other aerial insectivores. Here are some of the papers I was referring to:

      • Davies (1977) Prey selection and the search strategy of the spotted flycatcher (Muscicapa striata): A field study on optimal foraging. Animal Behaviour 25: 1016-1022.

      • Lifjeld & Slagsvold (1988) Effects of energy costs on the optimal diet: an experiment with pied flycatchers Ficedula hypoleuca feeding nestlings. Ornis Scandinavica 19: 111-118.

      • Quinney & Ankney (1985) Prey size selection by tree swallows. Auk 102: 245-250.

      • Turner (1982) Optimal foraging by the swallow (Hirundo rustica, L): Prey size selection. Animal Behaviour 30:862-872.

      Lastly, in terms of the work not being spatially-explicit, I do note that in lines 323-324 you acknowledge that prey populations can be patchy, then ten lines later, you provide citations to say that patchiness is not a problem because of spatial correlations. This is a bit overly dismissive, in my view, and to suggest (lines 336-337) that "patches of high insect concentration...might not exist at all" is certainly incorrect (and misleading). I do note the valiant attempt to address the spatial shortcoming in the remainder of the paragraph - although addressing it does not make the problem go away.

      Thanks for the comment.

      We revised the text to make it more coherent.

      Reviewer #2 (Recommendations For The Authors):

      L161: typo > missing space in 'meanof'

      Corrected.

      L192-193: Did the authors use the timing of sunrise and sunset to determine daytime?

      Yes. The daytime was calculated in relation to sunrise and sunset.

      Did the authors calculate the MTR from sunrise to sunset, or averaging the hourly MTR?

      If using hourly MTR, specify the criteria to assign an hourly MTR to daytime when sunset/sunrise is happening during that hour.

      A simplified terminology for "Average daily insect MTR" might be useful, in particular for the result section (insect MTR).

      Average daily insect MTR is calculated for a fixed period from 5 am to 8 pm local time. An explanation has been added to the Methods section, and the terminology in the text has been simplified as suggested

      Note that the 'M' of MTR stands for migration, which may not be appropriate in this context, and simply using "insect traffic rate" may be a better terminology.

      Thanks for the comment. The 'M' of MTR can also stand for movement, as the insects detected by the radar move in the airspace. This is how this term has been defined in the paper (e.g. in line 23 of the Summary section). Therefore, we did not change the terminology to “insect traffic rate”, which is a term not used in other studies.

      Considering the large number of predictions (10!), it would be appropriate to list them in the results (e.g., "on the daily average flight distance from the breeding colony (Prediction 3)").

      We added prediction numbers to the Results and the Discussion.

      Note that the terminology varies; e.g., in the introduction "overall daily flight distance" (L75), in the results "average length of the daily flight route" (L236), and further confusion with "daily average flight distance from the breeding colony" (L232).

      Thanks for the comment. fixed.

      The terminology - average daily 'air/flight' distance (L74-76) - needs clarification.

      Done.

      Results: Use only a relevant and consistent number of decimals to report on the effect size and p-values.

      Done.

      The authors are citing non-peer-reviewed publications:

      21. Bloch I, Troupin D, Sapir N. Movement and parental care characteristics during the nesting season of 468 the Little Swift (Apus affinis) [Poster presentation]. 12th European Ornithologists' Union Congress. Cluj Napoca, Romania. 2019.

      62. Zaugg S, Schmid B, Liechti F. Ensemble approach for automated classification of radar echoes into functional bird sub-types. In: Radar Aeroecology. 2017. p. 1. doi:10.13140/RG.2.2.23354.80326

      It is acceptable to cite non-peer-reviewed sources if they have a significant contribution to the background of the article without a critical impact on the core of the research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the first half of this study, Pham et al. investigate the regulation of TEAD via ubiquitination and PARylation, identifying an E3 ubiquitin ligase, RNF146, as a negative regulator of TEAD activity through an siRNA screen of ubiquitin-related genes in MCF7 cells. The study also finds that depletion of PARP1 reduced TEAD4 ubiquitination levels, suggesting a certain relationship between TEAD4 PARylation and ubiquitination which was also explored through an interesting D70A mutation. Pham et al. subsequently tested this regulation in D. melanogaster by introducing Hpo loss-of-function mutations and rescuing the overgrowth phenotype through RNF146 overexpression.

      In the second half of this study, Pham et al. designed and assayed several potential TEAD degraders with a heterobifunctional design, which they term TEAD-CIDE. Compounds D and E were found to effectively degrade pan-TEAD, an effect which could be disrupted by treatment with TEAD lipid pocket binders, proteasome inhibitors, or E1 inhibitors, demonstrating that the TEAD-CIDEs operate in a proteasome-dependent manner. These TEAD-CIDEs could reduce cell proliferation in OVCAR-8, a YAP-deficient cell line, but not SK-N-FI, a Hippo pathway independent cell line. Finally, this study also utilizes ATAC-seq on Compound D to identify reductions in chromatin accessibility at the regions enriched for TEAD DNA binding motifs.

      Strengths:

      The study provides compelling evidence that the E3 ubiquitin ligase RNF146 is a novel negative regulator of TEAD activity. The authors convincingly delineate the mechanism through multiple techniques and approaches. The authors also describe the development of heterobifunctional pan-degraders of TEAD, which could serve as valuable reagents to more deeply study TEAD biology.

      Weaknesses:

      The scope of this study is extremely broad. The first half of the paper highlights the mechanisms underlying TEAD degradation; however, the connection to the second half of the paper on small molecule degraders of TEAD is jarring, and it seems as though two separate stories were combined into this single massive study. In my opinion, the study would be stronger if it chose to focus on only one of these topics and instead went deeper.

      We thank the reviewer for the thoughtful feedback. In our mind, the two parts of the paper are inherently related as they both focus on proteasome-mediated degradation of TEADs. We first demonstrated that TEAD can be turned over by the ubiquitin proteasome system under endogenous conditions and identified a PARylation-dependent E3 ligase RNF146 as a major regulator of TEAD stability. Intriguingly, we observed that the four TEAD paralogs show different levels of polyubiquitination with some of them being highly stable in cells. These observations raised the question of whether the activity of the ubiquitin-proteasome system could be further enhanced pharmacologically to effectively target TEADs. We then tackled this question by providing a proof-of-concept demonstration of engineered heterobifunctional protein degraders can effectively degrade TEADs in cells and can be exploited as a therapeutic strategy for treating Hippo-dependent cancers.

      Additionally, the figure clarity needs to be substantially improved, as readability and interpretation were difficult in many panels. Lastly, there are numerous typos and poor grammar throughout the text that need to be addressed.

      We appreciate the suggestions from the reviewer and have updated the figures with high resolution images. We also corrected typos and grammatical errors in the text.

      Reviewer #2 (Public Review):

      The paper is made of two parts. One deals with RNF146, the other with the development of compounds that may cause TEAD degradation. The two parts are rather unrelated to each other.

      The main limit of this work is the lack of evidence that TEAD factors are in fact regulated by the proteasome and ubiquitylation under endogenous conditions. Also lacking is the demonstration that TEADs are labile proteins to the extent that such quantitative regulation at the level of stability can impact on YAP-TAZ biology. Without these two elements, the relevance and physiological significance of all these data is lacking.

      As for the development of new inhibitors of TEAD, this is potentially very interesting but underdeveloped in this manuscript. Irrespectively, if TEAD is stable, these molecules are likely lead compounds of interest. If TEAD is unstable, as entertained in the first part of the paper, then these molecules are likely marginal.

      We thank the reviewer for evaluating our manuscript. As the reviewer pointed out, the paper aimed to address 1) whether TEAD is being regulated by the proteasome and ubiquitination under endogenous conditions, and 2) whether TEAD can be inhibited through pharmacologically-induced degradation. First, we demonstrated that TEAD is ubiquitinated in cells and mapped the lysine residues that are poly-ubiquitinated (Fig. 1). Next, we identified RNF146 as a major E3 ligase that ubiquitinates TEADs and reduces their stability. Third, we show that RNF146-mediated TEAD ubiquitination is functionally important: RNF146 suppresses TEAD activity, and RNF146 genetically interacts with Hippo pathway components in fruit flies. Furthermore, as we showed in Fig. S2H, RNF-146 does not affect TEAD1 and TEAD4 to the same extent. Across all four cell lines evaluated, TEAD1 is more stable than TEAD4, raising the question of whether more consistent degradation of different TEAD paralogues could be achieved. To this end, we demonstrated that while the TEAD family of proteins is labile under endogenous conditions, more complete degradation of the TEAD proteins could be achieved using a heterobifunctional CRBN degrader. We further characterized these TEAD degraders in a series of cellular and genomic assays to demonstrate their cellular activity, selectivity, and inhibitory effects against YAP/TAZ target genes. We believe these degrader compounds would be of great interest to the Hippo community. We have revised the main text to clarify these points.

      Here are a few other specific observations:

      (1) The effect of MG is shown in a convoluted way, by MS. What about endogenous TEAD protein stability?

      We thank the reviewer for the question. The MS experiment shown in Figure 1 is a standard KGG experiment, where we used MS to map ubiquitination sites on TEADs. The graphical representation of the process is included in Fig. 1C, and the details of the procedure are included in the Methods section. Fig. 1D shows the different KGG peptides detected with or without MG-132 treatment. Fig. 1E shows the quantified abundance of each of the peptides across the four conditions indicated at the bottom of the plot. Regarding endogenous TEAD stability, ​​we conducted cycloheximide chase experiments to assess the stability of endogenously expressed TEAD isoforms upon RNF146 knockdown (Fig. S2G and S2H). Using isoform-specific antibodies, we demonstrated that siRNF146 significantly stabilized TEAD4 in multiple cell lines, including H226, PATU-8902, Detroit-562, and OVCAR-8 (Fig. S2G, S2H, and S2I), supporting the notion that RNF146 is a negative regulator of TEAD stability. Notably, the effect of siRNF146 on TEAD1 stability was less pronounced, and TEAD1 is more stable than TEAD4 across all four cell lines. These results are consistent with the lower level of ubiquitination of TEAD1 (Fig. 1A) and are corroborated by various biochemical, molecular, and genetic characterizations (Fig. 3A-C and S3E).

      (2) The relevance of siRNF on YAP target genes of Fig.2D is not statistically significant.

      We thank the reviewer for this comment. We have now removed the statistically significant claim.

      (3) All assays are with protein overexpression and Ub-laddering

      We thank the reviewer for the comment. To examine the ubiquitination level of TEAD proteins, we adopted an in vivo ubiquitination assay as described in our Materials and Methods section. To our knowledge, this assay is very standard in the ubiquitination field. Furthermore, as mentioned above, we have included in our revised manuscript cycloheximide chase experiments to assess the stability of endogenously expressed TEAD isoforms upon RNF146 knockdown (Fig. S2G and S2H). In addition to the overexpression system, we also assessed endogenously expressed TEAD using isoform-specific antibodies. We demonstrated that siRNF146 firmly stabilized TEAD4 in multiple cell lines, including H226, PATU-8902, Detroit-562, and OVCAR-8 (Fig. S2G with quantification and t-test), supporting the notion that RNF146 is a negative regulator of TEAD stability.

      (4) An inconsistency exists on the only biological validation (only by overexpression) on the fly eye size. RNF gain in Fig4C is doing the opposite of what is expected from what is portrayed here as a YAP/TEAD inhibitor: RNF gain is shown to INCREASE eye size, phenocopying a Hippo loss of function phenotype. According to the model proposed, RNF addition should reduce eye size. The authors stated that " This is in contrast to the anti-growth effect of RNF-146 in the Hpo loss-of-function background and indicates RNF146 may regulate other genes/pathways controlling eye sizes besides its role as a negative regulator of Sd/yki activity". This raises questions on what the authors are really studying: why, according to the authors, these caveats should occur on the controls, and not when they study Hpo mutants?

      We thank the reviewer for the comment. We acknowledge the complexity of the fly phenotype compared to tumor growth. TEAD (Sd) isn’t the only substrate of RNF146 in the fly. For instance, RNF146 is known to positively regulate Wnt signaling by degrading Axin. Previous studies have shown that activation of the Wnt signaling pathway by removal of the negative regulator Axin from clones of cells results in an overgrowth phenotype (Legent and Treisman, 2008). The overgrowth phenotype that we observed with overexpressing RNF146 only, therefore, likely is due to the role of RNF146 in regulating other signaling pathways. Importantly, we showed that upon Hippo loss of function, overexpression of RNF146 can rescue the Hippo overgrowth phenotype (Fig 4B). This differential outcome of RNF146 expression in wildtype versus Hippo-deficient flies indicates that the genetic interactions between RNF146 and Hippo pathway components altered the phenotypic outcome, and the phenotype we get with RNF146 overexpression in a Hippo loss of function background is not simply due to additive effects of functional loss of either component alone.

      Complementary to these overexpression data, we showed that knockdown of RNF146 increased the eye size further (Fig. S4A, B) in Hippo loss of function background, further supporting the role of RNF146 as a negative regulator of the overall pro-growth signals induced by yki upon Hippo loss of function.

      (5) The role of TEAD inactivation on YAP function is already well known. Disappointingly, no prior literature is cited. In any case, this is a mere control.

      We thank the reviewer for the suggestion. We have cited several published reviews that touch upon this aspect of the TEAD-YAP function, including Calses et al., 2019; Dey et al., 2020; Halder and Johnson, 2011; Wang et al., 2018. We are open to your suggestions on additional citations.

      (6) The second part of the paper on the Development and Screening of pan-TEAD lipid pocket degraders is interesting but unconnected to the above. The degradation pathway it involves has nothing to do with the enzyme described in the first figures.

      We thank the reviewer for the comment. We acknowledge that our paper broadly covers two aspects. We believe that they are inherently connected as they both address ubiquitin/proteasome-mediated TEAD degradation and the functional consequences of TEAD degradation. Given the increasing interest in targeting TEAD/YAP/TAZ in cancers, we think the pharmacological approaches to enhance TEAD degradation using orthogonal E3 ligases provide an important toolbox to understand how this pathway can be regulated under both physiological and pathological conditions. While RNF146 appears to be a major E3 ligase responsible for TEAD turnover under physiological conditions, we showed that the four TEAD paralogs have different poly-ubiquitination levels (Fig. 1A), and are differentially labile in cells (Fig. S2G-I). These observations raised the question of whether the activity of the ubiquitination-proteasome system could be further enhanced to allow more complete removal of TEADs. To this end, we demonstrated that E3 ligases that do not regulate TEAD under endogenous conditions can be leveraged pharmacologically to achieve deep TEAD degradation, thus providing a proof of concept that TEADs can be targeted simultaneously using such approaches. Finally, in addition to establishing the basic biological concept linking RNF146 to TEAD degradation, the compounds we engineered will serve as valuable chemical tools for future studies of TEAD biology and the Hippo pathway in cancers and beyond.

      (7) The role of CIDE on YAP accessibility to Chromatin is superficially executed. Key controls are missing along with the connection with mechanisms and prior knowledge of TEAD, YAP, chromatin, and other TEAD inhibitors, just to mention a few.

      We used ATAC-seq to assess chromatin accessibility comparing cells treated with DMSO and two different concentrations of compound D. We acknowledge there are small molecule inhibitors of TEADs that can modulate accessibility of YAP binding sites. Potential mechanistic differences between TEAD degraders versus TEAD small molecule inhibitions will be a future area of investigation.

      (8) The physiological relevance and the mechanistic interpretation of what should be in the ATAC seq in ovcar cells is missing.

      We showed in Fig. 7A-D the dose response of OVCAR cells to the TEAD degraders. As evident from those experiments, TEAD degraders inhibit the proliferation of OVCAR cells as expected from their dependencies on the TEAD/YAP/TAZ transcription complex. In the ATAC-seq experiment, we showed that the canonical TEAD/YAP/TAZ target genes ANKRD1 and CCN1 have reduced chromatin accessibility at their promoter/enhancer regions (Fig. 8C). By unbiased motif and pathway analyses, we show that TEAD binding sites and YAP signatures are most significantly downregulated in OVCAR-8 cells (Fig. 8D-E). These results are incorporated into the results section of the manuscript.

      Reviewer #3 (Public Review):

      Summary

      Pham, Pahuja, Hagenbeek, et al. have conducted a comprehensive range of assays to biochemically and genetically determine TEAD degradation through RNF146 ubiquitination. Additionally, they designed a PROTAC protein degrader system to regulate the Hippo pathway through TEAD degradation. Overall, the data appears robust. However, the manuscript lacks detailed methodological descriptions, which should be addressed and improved before publication. For instance, the methods used to analyze the K48 ubiquitination site on TEAD and the gene expression analysis of Hippo Signaling are unclear. Furthermore, the multiple proteomics, RNA-seq, and ATAC-seq data must be made publicly available upon publication to ensure reproducibility. Most of the main figures are of low resolution, which needs addressing.

      We thank the reviewer for evaluating our manuscript. All of the data will be uploaded to public databases. We apologize for the low figure resolution and have updated the figures in the revised manuscript. We also expanded the methods section with more details.

      Strengths:

      - A broad range of assays was used to robustly determine the role of RNF146 in TEAD degradation.

      - Development of novel PROTAC for degrading TEAD.

      Weaknesses:

      - An orthogonal approach is needed (e.g., PARP1 inhibitor) to demonstrate PARP1's dependency in TEAD ubiquitination.

      We thank the reviewer for the suggestion. We had attempted to assess the effect of PARP inhibitors (including veliparib and olaparib) on TEAD ubiquitination, but the data is relatively complex to interpret. Besides inhibiting PARP1/2 catalytic activities, these PARP inhibitors also trap PARP on chromatin. Hence, these inhibitors could induce other cellular changes in addition to inhibiting the catalytic activities of PARP1/2. Given these potential pitfalls, we decided not to include these inconclusive data. Even though the experiments with PARP inhibitors were inconclusive, our study supports that TEAD2 and TEAD4 are PARylated in cells using an anti-PAR antibody (Fig. 3B). Furthermore, we show that mutation of the D70 PARsylation site to alanine greatly abolished TEAD4 ubiquitination in cells, suggesting PARylation is important for TEAD4 ubiquitination. In addition, PARP1 depletion by siRNA and CRISPR guide RNA reduced TEAD2 and TEAD4 ubiquitination levels, indicating PARP1 is one of the PARPs responsible for TEAD PARylation in cells.

      - The data from Table 2 is unclear in illustrating the association of identified K48 ubiquitination with TEAD4, especially since the experiments were presumably to be conducted on whole cell lysates with KGG enrichment. This raises the possibility that the K48 ubiquitination could originate from other proteins. Alternatively, if the authors performed immunoprecipitation on TEAD followed by mass spectrometry, this should be explicitly described in the text and materials and methods section.

      We thank the reviewer for this question. The experiment was an IP-mass spectrometry study in a TEAD4 amplified cell line model (PATU-8902) after IP with a pan-TEAD antibody. Here, we observed K48 ubiquitin and other ubiquitin linkages as shown in the Supplementary Table S2 of the original submission. Although it is possible that the IP wash steps could be more stringent, we did enrich for TEAD protein prior to mass spectrometry. While the ubiquitin linkage signals may come mainly from TEAD protein (mainly TEAD4), we recognized that some signals may come from other proteins. Given the caveat, we have now removed the table from our paper and updated the text accordingly.

      - Figure 2D: The methodology for measuring the Hippo signature is unclear, as is the case for Figures 7E and F regarding the analysis of Hippo target genes.

      We apologize for the lack of clarification. In short, we previously developed the Hippo signature using machine learning and chemogenomics as described previously (Pham et al. Cancer Discovery 2021). In the revised version of the manuscript, we added the methodology for measuring the Hippo signature and cited our previous publication where we developed the Hippo signature.

      - Figure S3F requires quantification with additional replicates for validation.

      We thank the reviewer for the suggestion. We added the quantification for the blot and indicated the replication in the figure legend. Note that Figure S3F is now S3G.

      - There is a misleading claim in the discussion stating "TEAD PARylation by PAR-family members (Figure 3)"; however, the demonstration is only for PARP1, which should be corrected.

      We apologize for the statement. We observed both PARP1 and PARP9 in our TEAD IP-mass spec (now Figure S3E), which suggest both PARP-family members could be invovled. Nonetheless, we primarily focus on PARP1, which is widely expressed aross cell line models and present in higher abundance. Thus, our study only experimentally validated PARP1's role in regulating TEAD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General comments:

      (1) Please provide a smoother transition and well-defined connection between the first and second parts of the manuscript. The manuscript reads as two papers that were combined into one, without much attempt to disguise the fact.

      We thank the reviewer for the suggestion. We have added a transition paragraph to smoothen the transition. We acknowledge that our paper broadly covers two aspects. However, they both touch upon TEAD ubiquitination and degradation. In the first part of the manuscript, we described TEAD biology and showed that TEADs are post-translationally modified and subsequently regulated through PARylation-dependent RNF146-mediated ubiquitination. In the second part, we highlighted our abilities to leverage the PROTAC system for degrading such labile oncogenic proteins like TEADs. In addition to the biological concept, the compounds we engineered will serve as valuable chemical tools for future studies of TEAD biology and the Hippo pathway in cancers and beyond.

      (2) To confirm the proteasome mechanism of action, viability assays should be conducted with a CRBN KO.

      We thank the reviewer for the comment. In Figure 6E, we measured TEAD protein levels under CRBN knockdown and observed an expected change in TEAD stability. This observation and the other data presented in Figure 6 suggest that TEAD proteins are targeted for proteasomal degradation under compound D treatment.

      (3) As a control, sgPARP1 or PARP1 inhibitors should be used to confirm TEAD PARylation reduction.

      We thank the reviewer for the suggestion. We had attempted to assess the effect of PARP inhibitors (including veliparib and olaparib) on TEAD ubiquitination, but the data is relatively complex to interpret. Besides inhibiting PARP1/2 catalytic activities, PARP inhibitors also trap PARP on chromatin. Hence, these inhibitors could induce other cellular changes in addition to inhibit the catalytic activities of PARP1/2. Given these pitfalls, we decided not to include these inconclusive data. Even though the experiments with PARP inhibitors were inconclusive, our study supports that TEAD2 and TEAD4 are PARylated in cells using an anti-PAR antibody (Fig. 3B). Furthermore, we show that mutation of the D70 PARsylation site to alanine greatly abolished TEAD4 ubiquitination in cells, suggesting PARylation is important for TEAD4 ubiquitination. In addition, PARP1 depletion by siRNA and CRISPR guide RNA reduced TEAD2 and TEAD4 ubiquitination levels, indicating PARP1 is one of the PARPs responsible for TEAD PARylation in cells.

      (4) MS data looks convincing but an FDR of 1% should be applied - this is the accepted standard in the proteomics field. Please research the data with the more stringent filter.

      We thank the reviewer for the suggestion. Our IP-MS experiment comparing siNTC versus siYAP1/WWTR1 in Patu-8902 cells did not have replicates and FDR could not be derived. Therefore, we listed the raw data in Supplemental Table 3 without showing statistics. To validate the putative interactions identified by IP-MS, we performed IP-Western experiments to confirm that TEAD4 interacts with PARP1 (Figure 3A). It is important to note that in addition to our report, the interaction between PARP1 and TEADs has been observed in other publications (Calses et al., 2023; Yang et al., 2017). We have included more details of the IP-MS experiment reported in Supplemental Table 3 in the revised manuscript and cited previous work reporting TEAD-PARP1 interaction.

      (5) Proofread the manuscript more thoroughly for typos and grammatical errors.

      We thank the reviewer for raising this issue and have addressed it in the revision.

      (6) Improve figure clarity (e.g., clearly labeling graph axes).

      We apologize for the oversight. The revised manuscript contains high resolution figures.

      Specific points:

      Generally, the manuscript could use additional proofreading for grammar and clarity. It would not be practical to list all, but some representative examples are listed below:

      Run-on: "They act through an event-driven mechanism instead of conventional occupancy-driven pharmacology, in addition, target protein degradation removes all functions of the target protein and may also lead to destabilization of entire multidomain protein complexes."

      Typo: "Compound D exhibits significant inhibition of cell proliferation and downstream signaling compared to compound A, a reversible TEAD lipid pocket binder that lack the ubiquitin ligase binding moiety."

      Typo: "Thus, we sought to deplete TEAD proteins by directly target them for ubiquitination and proteasomal degradation via pharmacologically inducing interactions between TEAD and other abundantly expressed and PARylation-independent E3 ligases."

      Typo: "Compound A is a close in analog of Compound B as described previously (Holden et al., 2020)."

      We have revised the manuscript and corrected the typos and grammatical errors listed above and beyond.

      Specific comments on the figures are listed below:

      Figure 2:

      • Figures 2B and 2C should be separated into separate panels for clarity.

      We have updated the Figures 2B and 2C as suggested.

      • Figure 2C - "To further assess the function of RNF146, we depleted RNF146 by either sgRNA or siRNA." This should say either CRISPR-Cas9 KO or siRNA-mediated knockdown.

      We thank the reviewer for the suggestion. We revised the text to address this issue.

      • Figure 2D - y-axis is not labeled well/clearly. Additionally, there are different resolutions for the p-values on the graph (the top p-value is slightly clearer than the other two, suggesting either a different font was used or the value was pasted on top of a picture of the graph at a different resolution).

      We updated the figures according to the suggestions.

      • Figure S2A - "We identified three ubiquitin ligases - RNF146, TRAF3, and PH5A - as potential negative regulators for the Hippos pathway from the primary screen using the luciferase reporter." However, the siPHF5A data appears to decrease luciferase levels whereas siRNF146 and siTRAF3 increase it.

      We thank the reviewer for catching this error. We removed PH5A from this list.

      Figure 3:

      • Figure 3A - label more clearly. Is this an endogenous TEAD4 co-IP?

      We thank the reviewer for the suggestion. The experiment was an IP-mass spectrometry study in a TEAD4 amplified cell line model (PATU-8902) with pan-TEAD antibody. We have included the details to in the figure legends. Figure 3A is now Figure S3E in the revised manuscript.

      • Figure 3C - why are the dark and light exposures not matching/corresponding? In the dark exposure, there are two particularly dark bands, the darkest of which is at the top of the gel. However, this darkest band disappears in the light exposure gel. Additionally, the last lane is marked as +TEAD2 and +TEAD4. Not sure if this is a typo, and meant to be only +TEAD4? Seems a bit strange to have a double TEAD lane.

      We thank the reviewer for this comment and apologize for the oversight. There was a typo in the label. The light exposure image was from a replicate run instead of the same run, therefore the lanes didn’t all match up. We have removed the light exposure panel to resolve the confusion. (Figure 3B).

      Figure 5:

      • Figure 5B - why is shTEAD1-4/Sucrose a much higher tumor volume than shNTC/Sucrose negative control? Additionally, should the legend say "sNTC/Sucrose" as it does or "shNTC/Sucrose"?

      The labels for shTEAD1-4/Sucrose and shNTC/Sucrose are correct. We do not understand why there is a slight increase in tumor volume for shTEAD1-4/Sucrose and suspect that is due to the considerable variation in the experiment. This slight change, however, doesn’t influence our observation of tumor regression in shTEAD1-4 under the Doxycycline treatment.

      "sNTC/Sucrose" is a typo. We apologize for the oversight and have revised the figure.

      • Figure 5E - cited in text after Figures 6 and 7.

      We have updated the text accordingly.

      Figure 6:

      • Figure 6B - it is very interesting how this clearly shows the Hook effect for Compound D, but it's a bit harder to see for compound E that the compound degrades pan-TEAD. Would it be possible to quantify the blots to reinforce claims about protein degradation here?

      We thank the reviewer for the question. There may seem to be some hook effect across the three concentrations of compound D treatment in Fig. 6B.  However, in Fig. 6C-E, we observed pretty consistent TEAD degradation levels across a variety of concentrations. In addition, these experiments have been repeated in multiple cell lines with consistent results. We respectfully argue that more detailed investigation of the hook effect is beyond the scope of our study.

      Figure 7:

      • Figure 7F - this heat map is extremely difficult to interpret. Are there any interesting clusters? What are the darker/lighter bands for Compound D compared to DMSO control?

      We thank the reviewer for the comment and apologize for the lack of information on the figure. These are genes from a Hippo signature derived from our earlier work (Pham et al. Cancer Discovery). As a result of degrading TEAD when treating the cells with Compound D, we observed an expected downregulation of most of these genes compared to compound A.

      Figure 8:

      • Figure 8B - these two pie charts are also difficult to interpret. Perhaps try to present the data in a form other than encircling pie charts?

      We thank the reviewer for the suggestion. However, this is a very descriptive pie chart, we used this format to save space.

      • Figure 8C - what is GNE-6915? Is this Compound D?

      Yes, this is compound D. The text is updated accordingly.

      Reviewer #3 (Recommendations For The Authors):

      Figure 3A would benefit from explicitly stating the conditions within the figure, rather than referring to the legend. This clarity is also needed for Figure 8C, indicating whether the treatment was with compound D or GNE-6915.

      We thank the reviewer for the suggestion. We have added the details to the figures and made the suggested edits.

      Standardize the terms "ubiquitination" and "ubiquitylation" throughout the paper for consistency.

      We now use the term “ubiquitination” throughout the manuscript.

      The statement "In this study, we show that the activity of TEAD transcription factors can be post-transcriptionally regulated via the ubiquitin/proteasome system" should be corrected to "post-translationally regulated."

      We have update the manuscript accordingly.

      There is an additional exclamation mark above Figure 5E that should be removed.

      We have revised Figure 5E.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution. It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The prior literature the reviewers referred to are all "modified WF models". In the original submission, we lumped the standard and modified WF models together as the "generalized WF models". As the lumping causes confusions, their distinctions are now made clear.  That said, the Haldane model in our proposal is not a modification of the standard WF model because, conceptually, the two models are very different. WF is based on sampling whereas the Haldane model is based on gene transmission.

      While the "modified WF models" often incorporate V(K) [variance in progeny number], the modification is still based on the WF model of population sampling. The modification is mathematically feasible but biologically untenable, as explained explicitly in the revised text. Most important, all four paradoxes are as incompatible with the modified WF models as with the standard model. Note that the Haldane model does not have the sampling step, which is absorbed into the V(K) term. In the integrated WF-Haldane model, these paradoxes are resolved (see the new sections of Discussion, quoted below).

      If readers do not have time to ponder on all four paradoxes, they may simply read the first one, as follows. When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and becomes stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's central prediction. Any model based on sampling cannot escape the constraint of "greater drift, smaller N".

      Revision - The following text is a reproduction of the last 7 paragraphs of Discussion.

      “The standard WF model has been extended in several directions (overlapping generations, multiple alleles, ploidy, etc.). The modification most relevant to our studies here is the introduction of V(K) into the model, thus permitting V(K) ≠ E(K). While the modifications are mathematically valid, they are often biologically untenable. Kimura and Crow (1963) may be the first to offer a biological mechanism for V(K) ≠ E(K), effectively imposing the Haldane model on the WF model. Other models (Kimura and Crow 1963; Lynch, et al. 1995; Sjodin, et al. 2005; Der, et al. 2011; Cannings 2016) indeed model mathematically the imposition of the branching process on the population, followed by the WF sampling. The constructions of such models are biologically dubious but, more importantly, still unable to resolve the paradoxes. It would seem more logical to use the Haldane model in the first place by having two parameters, E(K) and V(K). 

      Even if we permit V(K) ≠ E(K) under the WF sampling, the models would face other difficulties. For example, a field biologist needs to delineate a Mendelian population and determine its size, N or Ne. In all WF models, one cannot know what the actual population being studied is. Is it the fly population in an orchard being sampled, in the geographical region, or in the entire species range? It is unsatisfactory when a population biologist cannot identify the population being studied. The Haldane model is an individual-output model (Chen, et al. 2017), which does not require the delineation of a Mendelian population.

      We shall now review the paradoxes specifically in relation to the modified WF models, starting with the multi-copy gene systems such as viruses and rRNA genes covered in the companion study (Wang, et al. 2024). These systems evolve both within and between hosts. Given the small number of virions transmitted between hosts, drift is strong in both stages as shown by the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). Therefore, it does not seem possible to have a single effective population size in the WF models to account for the genetic drift in two stages. The inability to deal with multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution (Deng, et al. 2022; Pan, Liu, et al. 2022; Ruan, Wen, et al. 2022; Hou, et al. 2023; Ruan, et al. 2023).

      We now discuss the first paradox of this study, which is about the regulation of N. In the general WF models, N is imposed from outside of the model, rather than self-generating within the model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity. As far as we know, no modifications of the WF model can account for this phenomenon that is opposite of its central tenet. In the general WF models, N is really the carrying capacity, not population size. 

      The second paradox of sex chromosomes is rooted in V(K) ≠ E(K). As E(K) is the same between sexes but V(K) is different, clearly V(K) = E(K) would not be feasible. The mathematical solution of defining separate Ne's for males and females (Kimura and Crow 1963; Lynch, et al. 1995; Sjodin, et al. 2005; Der, et al. 2011; Cannings 2016) unfortunately obscures the interesting biology. As shown in Wang et al. (2024; MBE), the kurtosis of the distribution of K indicates the presence of super-breeder males. While the Haldane model can incorporate the kurtosis, the modified WF models are able to absorb only up to the variance term, i.e., the second moment of the distribution. The third paradox of genetic drift is manifested in the fixation probability of an advantageous mutation, 2_s_/V(K). As explained above, the fixation probability is determined by the probability of reaching a low threshold that is independent of N itself. Hence, the key parameter of drift in the WF model, N (or Ne), is missing. This paradox supports the assertion that genetic drift is fundamentally about V(K) with N being a scaling factor. 

      As the domain of evolutionary biology expands, many new systems do not fit into the WF models, resulting in the lack of a genetic drift component in their evolutionary trajectories. Multi-copy gene systems are obvious examples. Others include domestications of animals and plants that are processes of rapid evolution  (Diamond 2002; Larson and Fuller 2014; Purugganan 2019; Chen, Yang, et al. 2022; Pan, Zhang, et al. 2022; Wang, et al. 2022). Due to the very large V(K) in domestication, drift must have played a large role. Somatic cell evolution is another example with “undefinable” genetic drift (Wu, et al. 2016; Chen, et al. 2017; Chen, et al. 2019; Ruan, et al. 2020; Chen, Wu, et al. 2022). The Haldane (or WFH) model, as an "individual output" model, can handle these general cases of genetic drift.

      The Haldane model and the WF model are fundamentally different approaches to random forces of evolution. While the WF models encounter many biological contradictions, they have provided approximate mathematical solutions to more realistic scenarios. In systems such as in viral evolution (Ruan, Hou, et al. 2022; Hou, et al. 2023) or somatic cell evolution (Chen, Wu, et al. 2022; Zhai, et al. 2022) whereby the WF solution is absent, further development of the WFH model will be necessary.”

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims.

      This point is addressed in the responses to reviewers' comments. Since they are quite technical, they do not fit in the overview here.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Weaknesses:

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more.

      The reviewer's summary and weakness statement reflects the general criticism summarized by the editors. The reply and revision to these criticisms have been presented in the long reply to elife assessment above.

      We hence re-emphasize only the key points here.

      (1) The literature that the reviewers fault us for not citing is about the modifications of the standard WF model. We now cite them as well as a few others in that vein. However, the WF-Haldane model we propose is conceptually very different from the modified WF models. This WFH model is in essence the Haldane model which may use the results of the WF models as the starting point to find the exact solutions.

      (2) The check of the power of the modified WF models is whether they can resolve the paradoxes. None of them can. The arguments apply to neutral cases as well as selection effects. Hence, our central point is that the modifications of the standard WF model [e.g., by incorporating V(K)] do not help the WF model in resolving the paradoxes.  Besides, the incorporation of V(K) is mathematically feasible but biologically untenable as presented in the new sections of Discussion.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims.

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, …… - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. [Since the comments above are highly technical and fairly long, they are not copied verbatim.]

      We thank this reviewer for the detailed comments with respect to the potential confusion in the discussion of the Density Dependent Haldane (DDH) model.

      First, the reviewer appears to ask how Eqs (5-6) are derived. We should clarify that both Eq (5) and (6) are assumptions rather than derived results. Both equations are assumptions based on population ecology. Eq (7) is then derived by substituting the assumptions in Eq (5) and (6) into Eq (3).

      The definition in Equation (5) allows the growth rate of the population size to be dependent on N itself, such that growth rate E(K) (average offspring number per generation) is greater than 1 when N < Ck and less than 1 when N > Ck. The parameter z is introduced to adjust the sensitivity of E(K) to changes in population size (as shown in Fig. 3a).

      Second, we appreciate the comments regarding the use of individual-based simulations and the apparent lack of interaction between individuals. In our simulations, there is indeed an interaction among individuals, which is represented by Eq (5). This equation reflects how the competition between two alleles affects the expected growth rate 𝐸(𝐾), which decreases as the population size increases. Furthermore, once 𝐸(𝐾) for the entire population is determined, the offspring numbers of the alleles are independent.

      We believe that the primary purpose of our simulations was not clearly stated. This lack of clarity may be the root of the criticisms. We now note that the simulations are aimed at testing the accuracy of Equation (10).

      Note that Eq. (10) is a textbook result and quite important in our study. This equation shows that the strength of genetic drift, as given by Pf (the fixation probability of an advantageous mutation), is not a function of N at all. This approximate solution has been obtained using the WF model by Kimura.  The Haldane model solution that can explain Paradox 1 is based on Equation (7) as shown below

      Since the fixation probability of Equation (10) cannot be easily obtained using Eq. (7), we conducted simulations to confirm the accuracy of Eq. (10) when applied to the Haldane model.

      We have revised the relevant sections of the manuscript to clarify these points and to better distinguish between assumptions and results. 

      Revision - Details of the DDH model are given in the Supplementary Information. A synopsis is given here: We consider a non-overlapping haploid population with two neutral alleles. The population size at time t is Nt. We assume that expected growth rate E(K) is greater than 1 when N < Ck and less than 1 when N > Ck, as defined by Eq. (5) below:

      The slope of E(K) vs. N (i.e., the sensitive of growth rate to changes in population size), as shown in Fig 3a, depends on z. To determine the variance V(K), we assume that K follows the negative binomial distribution whereby parents would suffer reproduction-arresting injury with a probability of pt at each birthing (Supplementary Information). Accordingly, V(K) can then be expressed as

      By Eq. (6), the ratio of V(K)/E(K) could be constant, decrease or increase with the increase of population size. With E(K) and V(K) defined, we could obtain the effective population size by substituting Eq. (5) and Eq. (6) into Eq. (3).

      Eq. (7) presents the relationship between effective population size (Ne) and the population size (N) as shown in Fig. 3. The density-dependent E(K) could regulate N with different strength (Fig. 3a). The steeper the slope in Fig. 3a, the stronger the regulation.

      Simulation of genetic drift in the Haldane model and the Wright-Fisher (WF) model. In both models, interactions between individuals are implicitly included through the dependency of the average number of offspring on population size, as defined by Eq. (5). This dependency leads to the logistic population growth, reflecting the density-dependent interactions.

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show.

      References omitted

      The comments are the summary of previous ones, which have been addressed in detail in the preceding sections.

      Reviewer #2 (Public Review):

      Summary:

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size [Ne]

      Thanks.  The issue of Ne will be addressed below where the reviewer returns to this issue. The strength of the integrated WFH model is that N (or Ne) is generated by the model itself, rather than externally imposed as in WF models.

      Strengths:

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems.

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species.

      Thanks. 

      Weaknesses:

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent.

      The Haldane model treats N’s very differently from the WF models.  In the WF models, N’s are imposed externally (say, constant N, exponentially growing N, temporally fluctuating N’s and so on; all provided from outside of the model). Ne and coalescence are all derived from these given N’s.  In order to account for the first paradox (see the next paragraph), N needs to be regulated but the WF models cannot regulate N’s. The density-dependent Haldane model that Reviewer 1 inquired above is a model that regulates N internally. It can thus account for the paradox.

      Paradox 1 -  When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such a pattern is a common observation and is exactly opposite of the WF model's central prediction. In short, a model that does not regulate N cannot explain the paradox

      Ne is a fix of the WF model in order to account for the missing components of genetic drift. The paradoxes presented in this one and the companion study show that the fix is rather inadequate.  In contrast, by the WFH model, N is regulated within the model itself as E(K) and V(K) are both functions of N.

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes".

      We appreciate greatly the illuminating advice.  Nevertheless, we should explain, or should have explained, more clearly that these four paradoxes presented are central to this pair of eLife papers. The WF and Haldane models are very different conceptual ideas altogether. The choice should not be based on mathematical grounds but on how they help us understand biological evolution. We are using four paradoxes to highlight the differences.  We have said in the papers that the origin and evolution of COVID-19 caused a lot of confusions partly because the WF models cannot handle multi-copy gene systems, including viruses that evolve both within- and between- hosts.

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations.

      We appreciate the question which has two aspects.  First, why do we think the WF models are insufficient? After all, for each assumption of the WF model (as given in the reviewer’s examples), there is often a solution by modifying Ne which relaxes the assumption. In this sense, there is only one grand assumption made by the WF models. That is, however complex the biology is, it is possible to find Ne that can make the WF model work. Our argument is that Ne is a cumbersome fix of the WF model and it does not work in many situations. That is how we replied about the importance of the paradoxes above.  We shall again use the first paradox as an example whereby drift is stronger as N becomes larger, the fix has to make Ne negatively correlated with N. In reality, it does not appear possible to resolve this paradox. Another paradox is the evolution of multi-copy gene systems. In short, it seems clear that Ne is not a useful or usable fix.

      The second aspect is that “why, among the many modifications the WF models make, do we only emphasize the inclusion of V(K)?” This is the essence of the two papers of ours.  Although V(K) is a modification of the WF models, it does not enable the WF models to resolve the paradoxes. In contrast, the Haldane model has incorporate E(K) and V(K) in the model. In presenting paradox 3, it was stated that

      This equation shows that the strength of genetic drift, as given by Pf (the fixation probability of an advantageous mutation), is not a function of N at all. It supports the view that the essence of genetic drift is V(K) with N as a scaling factor. Note that, if V(K) = 0, there is no genetic drift regardless of N. As V(K) is not an add-on to the Haldane model (unlike in WF models), the Haldane model can resolve the paradoxes.

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient.

      a' can be negative for the same reason that a (the male/female ratio in mutation rate) can be negative (Miyata, et al. 1987; Li, et al. 2002; Makova and Li 2002). Clearly, this has not been a problem in the large literature on a becoming negative.  In fact, in many reports, a is negative, which is read as a approaching infinity.  Imagine that our equation is a'^2 = 0.25, then a' can be 0.5 or -0.5, although the latter solution is not biologically meaningful.

      As for theta, the reviewer asked why we do not use the pairwise genetic diversity (or theta[pi]) as the first-principle approach to estimating theta. While theta(pi) is the first estimator of theta used, the general principle is that every bin of the frequency spectrum can be used for estimating theta since the expected value is theta/i where i is the occurrence of the mutation in the sample.  (If the sample size is 100, then i is between 1 and 99.)  Hence, the issue is which part of the spectrum has the best statistical properties for the questions at hand.  The pairwise measure is theta(pi) [which the reviewer recommends]. While theta(pi) and theta(w) are most commonly used, there are in fact numerous ways to estimate theta.  ((Fu 2022) presents an excellent review.) For our purpose, we need a theta estimate least affected by selection and we choose the lowest frequency bin of the spectrum, which is theta(1) based on the singletons. Theta(1), least affected by selection, is the basis of the Fu and Li test. 

      Reviewer #3 (Public Review):

      Summary:

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data.

      Strengths:

      (1) The theoretical results are well-described and easy to follow.

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm.

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind.

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size.

      Thanks.

      Weaknesses:

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process.

      Please allow us to refer to, again, two of the four paradoxes.  We believe that that no modification of the WF model can resolve the paradoxes.

      (1) When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's key prediction. It is not possible for a model that does not regulate N to explain the paradox.

      (2) There is no way the WF models can formulate Ne for, say viruses or ribosomal RNA genes that have two levels of populations – the within-host populations as well as the host population itself.

      The fact that there are numerous Ne's suggests that Ne is a collection of cumbersome fixes of the WF model. By the WF-Haldane model, all factors are absorbed into V(K) resulting in a simpler model in the end. V(K) is often a measurable quantity. Note that, even if V(K) is incorporated into the WF model, the paradoxes remain unresolvable.

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper.

      The criticism of lack of engagement with well-established literature has been responded extensively above.  Briefly, the literature is about modifications of the WF model which share the same feature of population sampling. With that feature, the paradoxes are unresolvable.  For example, however Ne is defined, the fixation probability of an advantageous mutation does not depend on N or Ne. This is the third paradox of the WF models.

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes.

      We admire the learned understanding of the literature expressed by the review, which raise two points.  First, our model may not be able to handle the heavy-tailed progeny distribution (i.e., the kurtosis of the distribution of k). Second, the Xi coalescence models (cited above) can do that.  Below are our clarifications.

      First, the WFH model is based on the general distribution of K, which includes flexible and realistic representations of offspring number distributions. In fact, we have used various forms of K distribution in our publications on the evolution of SARS-CoV-2 (see the Ruan et al publications in the bibliography). Power-law distribution is particularly useful as the K-distribution in viral transmission is highly kurtotic. This is reflected in the super-spreader hypothesis. In short, the branching process on which the WFH model is based in is mainly about the distribution of K. Nevertheless, the variance V(K) can often yield good approximations when the kurtosis is modest.

      Second, we would like to comment on the models of Eldon and Wakely 2006. or Mohle and Sagitov 2001 and subsequent papers. These papers are based on the Moran model by considering a highly skewed distribution of offspring numbers. Fundamentally, the Moran models generally behave like WF models (standard or modified) and hence have the same problems with the paradoxes that are central to our studies. In fact, the reservations about introducing V(K) into the WF models apply as well to the Moran models.  The introduction of V(K) is mathematically valid but biologically untenable. Essentially, the WF models incorporate the Haldane model as a first step in the generation transition. The introduction of V(K) into the Moran model is even less biologically sensible. Furthermore, the model allows K to take only three discrete values: 0, 2, and Nψ (see Eq. (7) in Eldon and Wakely). Their model also assumes a constant population size, which contrasts with our model's flexibility in handling varying population sizes and more complex distributions for K.

      In short, the modifications of the WF (and Moran) models are unnecessarily complicated, biologically untenable but still fail to account for the paradoxes. The WFH model can rectify these problems. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...)

      The reviewer is correct in pointing out the inexact correlation between N and Ne. Nevertheless, it should still be true that the WF models predict qualitatively weaker drift as N increases. The first paradox is as stated:

      When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's key prediction.

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model.

      We thank the reviewer for this important comment. The main difference is that N is imposed from outside the WF models but can be generated from within the Haldane model (see the density-dependent Haldane model). In nature, N of the next generation is the sum of K’s among members of the population. It is how the Haldane model determines N(t+1) from N(t). In the WF models, N is imposed from outside the model and, hence the given N determines the distribution of K.  For this reason, N regulation is not possible in the WF models, thus resulting in the paradoxes.

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances.

      We thank the reviewer and have used this more convincing case as suggested by the reviewer.

      Recommendations for the authors:

      General replies - We thank the editors and reviewers again.  The points below are re-iterations of the comments received above and have since been replied in detail. Specific instructions about wording and notations have also been rectified. Again, we are grateful for the inputs from which we learned a great deal.

      Reviewing Editor Comments:

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      We have addressed this issue in the beginning of Introduction and Discussion, pointing to the long section in the new second half of Discussion. The essence is that the literature is all about the modified WF models.  The WF-Haldane model is conceptually and operationally distinct from the WF models, either standard or modified ones,

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned.

      This criticism is addressed together with the next one as they make the same point.

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments.

      Many thanks.  We have carefully reframed and presented the WF-Haldane model to make it clear and logically consistent. Whether a new model (i.e., the WF-Haldane model) deserves to be introduced depends on whether it makes any contribution for understanding nature. That is why we emphasize the four paradoxes. 

      A most important disagreement between the reviewers and the authors is about the nature of the paradoxes. While the reviewers suggest that they "may" be resolvable by the conventional WF model (standard or modified), they did not offer the possible resolutions.  To use the analogy in our provisional response: the WF vs. Haldane models are compared to gas cars vs electric vehicles.  We can say confidently that the internal combustion engine cannot resolve the conflicting demands of transportation and zero emission. Its design has limited its capability. 

      Reviewer #2 (Recommendations For The Authors):

      Many thanks.  We have incorporated all these suggestions.  When the incorporation is not straightforward, we have carefully revised the text to minimize mis-communications.

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency.

      We change the word “simply” to “essentially”. This wording is supported by the fixation probability of advantageous mutations, 2s/(V(k). We have shown in the text that N does not matter here because the fixation is nearly deterministic when the copy number reaches, say, 100, regardless of whether N is 10^4 or 10^8,

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation"

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example.

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations.

      Thank you for your careful reading. Regarding the use of i_k and j_k  as probabilities, we initially considered using 𝑝 or 𝑞 to represent probabilities. However, since 𝑝 and 𝑞 are already used in the main text, we opted for 𝑖 and 𝑗 to avoid potential confusion potential confusion. As for your recommendation to use

      E[X] and V[X] for expectation and variance, we would like to clarify that we follow the standard practice of italicizing these symbols to represent variables.

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text.<br /> Supplement page 7, the term "probability of fixation" is confusing in a branching model.

      Thank you for your observation. We have carefully revised the supplement to provide clarity on these points.<br /> Revision - In population genetics, the fixation of M allele means that the population consist entirely of the M allele, with no W alleles remaining. We define the fixation probability of M allele by generation t as follows:

      Given that M and W allele reproduce independently, this can be factored as:

      As t approaches infinity, the ultimate fixation probability of M allele can be derived as follows:

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice.

      We appreciate your careful review, and we will ensure this connection between the two equations is made clearer in the supplement. 

      Revision - Note we would like to clarify that Eq. (A1) and Eq. (A28) are essentially the same, with the only difference being the subscript 𝑡, which indicates the time dependence in the dynamic process.

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

      Thank you for raising this concern. We have addressed this issue earlier.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Comments on revisions:

      This revision addressed all my previous comments.

      Reviewer #3 (Public Review):

      Comments on revisions:

      The authors addressed my comments and it is ready for publication.

      We are grateful for the reviewers’ effort and are encouraged by their generally positive assessment of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      This revision addressed all my previous comments. The only new issue concerns the authors’ response to the following comment of reviewer 3:

      (2) Authors note ”monovalent positive salt ions such as Na+ can be attracted, somewhat counterintuitively, into biomolecular condensates scaffolded by positively-charged polyelectrolytic IDRs in the presence of divalent counterions”. This may be due to the fact that the divalent negative counterions present in the dense phase (as seen in the ternary phase diagrams) also recruit a small amount of Na+.

      Author reply: The reviewer’s comment is valid, as a physical explanation for this prediction is called for. Accordingly, the following sentence is added to p. 10, lines 27-29: ...

      Here are my comments on this issue. Most IDPs with a net positive charge still have negatively charged residues, which in theory can bind cations. In fact, Caprin1 has 3 negatively charged residues (same as A1-LCD). All-atom simulations of MacAinsh et al (ref 72) have shown that these negatively charged residues bind Na+; I assume this effect can be captured by the coarsegrained models in the present study. Moreover, all-atom simulations showed that Na+ has a strong tendency to be coordinated by backbone carbonyls, which of course are present on all residues. Suggestions:

      (a) The authors may want to analyze the binding partners of Na+. Are they predominantly the3 negatively charged residues, or divalent counterions, or both?

      (b) The authors may want to discuss the potential underestimation of Na+ inside Caprin1 condensates due to the lack of explicit backbone carbonyls that can coordinate Na+ in their models. A similar problem applies to backbone amides that can coordinate anions, but to a lesser extent (see Fig. 3A of ref 72).

      The reviewer’s comments are well taken. Regarding the statement in the revised manuscript “This phenomenon arises because the positively charge monovalent salt ions are attracted to the negatively charged divalent counterions in the protein-condensed phase.”, it should be first noted that the statement was inferred from the model observation that Na+ is depleted in condensed Caprin1 (Fig. 2a) when the counterion is monovalent (an observation that was stated almost immediately preceding the quoted statement). To make this logical connection clearer as well as to address the reviewer’s point about the presence of negatively charged residues in Caprin1, we have modified this statement in the Version of Record (VOR) as follows:

      “This phenomenon most likely arises from the attraction of the positively charge monovalent salt ions to the negatively charged divalent counterions in the proteincondensed phase because although the three negatively charged D residues in Caprin1 can attract Na+, it is notable that Na+ is depleted in condensed Caprin1 when the counterion is monovalent (Fig. 2a).”

      The reviewer’s suggestion (a) of collecting statistics of Na+ interactions in the Caprin1 condensate is valuable and should be attempted in future studies since it is beyond the scope of the present work. Thus far, our coarse-grained molecular dynamics has considered only monovalent Cl− counterions. We do not have simulation data for divalent counterions.

      Following the reviewer’s suggestion (b), we have now added the following sentence in Discussion under the subheading “Effects of salt on biomolecular LLPS”:

      “In this regard, it should be noted that positively and negatively charged salt ions can also coordinate with backbone carbonyls and amides, respectively, in addition to coordinating with charged amino acid sidechains (MacAinsh et al., eLife 2024). The impact of such effects, which are not considered in the present coarse-grained models, should be ascertained by further investigations using atomic simulations (MacAinsh et al., eLife 2024; Rauscher & Pom`es, eLife 2017; Zheng et al., J Phys Chem B 2020).”

      Here we have added a reference to Rauscher & Pom`es, eLife 2017 to more accurately reflect progress made in atomic simulations of biomolecular condensates.

      More generally, regarding the reviewer’s comments on the merits of coarse-grained versus atomic approaches, we re-emphasize, as stated in our paper, that these approaches are complementary. Atomic approaches undoubtedly afford structurally and energetically high-resolution information. However, as it stands, simulations of the assembly-disassembly process of biomolecular condensate are nonideal because of difficulties in achieving equilibration even for a small model system with < 10 protein chains (MacAinsh et al., eLife 2024) although well-equilibrated simulations are possible for a reasonably-sized system with ∼ 30 chains when the main focus is on the condensed phase (Rauscher & Pom`es, eLife 2017). In this context, coarse-grained models are valuable for assessing the energetic role of salt ions in the thermodynamic stability of biomolecular condensates of physically reasonable sizes under equilibrium conditions.

      In addition to the above minor additions, we have also added citations in the VOR to two highly relevant recent papers: Posey et al., J Am Chem Soc 2024 for salt-dependent biomolecular condensation (mentioned in Dicussion under subheadings “Tielines in protein-salt phase diagrams” and “Counterion valency” together with added references to Hribar et al., J Am Chem Soc 2002 and Nostro & Ninham, Chem Rev 2012 for the Hofmeister phenomena discussed by Posey et al.) and Zhu et al., J Mol Cell Biol 2024 for ATP-modulated reentrant behavior (mentioned in Introduction). We have also added back a reference to our previous work Lin et al., J Mol Liq 2017 to provide more background information for our formulation.

      Reviewer #2 (Recommendations For The Authors):

      The authors have done a great job addressing previous comments.

      We thank this reviewer for his/her effort and are encouraged by the positive assessment of our revised manuscript.

      ---

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors used multiple approaches to study salt effects in liquid-liquid phase separation (LLPS). Results on both wild-type Caprin1 and mutants and on different types of salts contribute to a comprehensive understanding.

      Strengths:

      The main strength of this work is the thoroughness of investigation. This aspect is highlighted by the multiple approaches used in the study, and reinforced by the multiple protein variants and different salts studied.

      We are encouraged by this positive overall assessment.

      Weaknesses: (1) The multiple computational approaches are a strength, but they’re cruder than explicit-solvent all-atom molecular dynamics (MD) simulations and may miss subtle effects of salts. In particular, all-atom MD simulations demonstrate that high salt strengthens pi-types of interactions (ref. 42 and MacAinsh et al, https://www.biorxiv.org/content/10.1101/2024.05.26.596000v3).

      The relative strengths and limitations of coarse-grained vs all-atom simulation are now more prominently discussed beginning at the bottom of p. 5 through the first 8 lines of p. 6 of the revised manuscript (page numbers throughout this letter refer to those in the submitted pdf file of the revised manuscript), with MacAinsh et al. included in this added discussion (cited as ref. 72 in the revised manuscript). The fact that coarse-grained simulation may not provide insights into more subtle structural and energetic effects afforded by all-atom simulations with regard to π-related interaction is now further emphasized on p. 11 (lines 23–30), with reference to MacAinsh et al. as well as original ref. 42 (Krainer et al., now ref. 50 in the revised manuscript).

      (2) The paper can be improved by distilling the various results into a simple set of conclusions. By example, based on salt effects revealed by all-atom MD simulations, MacAinsh et al. presented a sequence-based predictor for classes of salt dependence. Wild-type Caprin1 fits right into the “high net charg”e class, with a high net charge and a high aromatic content, showing no LLPS at 0 NaCl and an increasing tendency of LLPS with increasing NaCl. In contrast, pY-Caprin1 belongs to the “screening” class, with a high level of charged residues and showing a decreasing tendency of LLPS.

      This is a helpful suggestion. We have now added a subsection with heading “Overview of key observations from complementary approaches” at the beginning of the “Results” section on p. 6 (lines 18–37) and the first line of p. 7. In the same vein, a few concise sentences to summarize our key results are added to the first paragraph of “Discussion” (p. 18, lines 23– 26). In particular, the relationship of Caprin1 and pY-Caprin1 with the recent classification by MacAinsh et al. (ref. 72) in terms of “high net charge” and “screening” classes is now also stated, as suggested by this reviewer, on p. 18 under “Discussion” (lines 26–30).

      (3) Mechanistic interpretations can be further simplified or clarified. (i) Reentrant salt effects (e.g., Fig. 4a) are reported but no simple explanation seems to have been provided. Fig. 4a,b look very similar to what has been reported as strong-attraction promotor and weak-attraction suppressor, respectively (ref. 50; see also PMC5928213 Fig. 2d,b). According to the latter two studies, the “reentrant” behavior of a strong-attraction promotor, CL- in the present case, is due to Cl-mediated attraction at low to medium [NaCl] and repulsion between Cl- ions at high salt. Do the authors agree with this explanation? If not, could they provide another simple physical explanation? (ii) The authors attributed the promotional effect of Cl- to counterionbridged interchain contacts, based on a single instance. There is another simple explanation, i.e., neutralization of the net charge on Caprin1. The authors should analyze their simulation results to distinguish net charge neutralization and interchain bridging; see MacAinsh et al.

      The relationship of Cl− in bridging and neutralizing configurations, respectively, with the classification of “strong-attraction promoter” and “weak-attraction suppressor” by Zhou and coworkers is now stated on p. 13 (lines 29–31), with reference to original ref. 50 by Ghosh, Mazarakos & Zhou (now ref. 59 in the revised manuscript) as well as the earlier patchy particle model study PMC5928213 by Nguemaha & Zhou, now cited as ref. 58 in the revised manuscript. After receiving this referee report, we have conducted an extensive survey of our coarse-grained MD data to provide a quantitative description of the prevalence of counterion (Cl−) bridging interactions linking positively charged arginines (Arg+s) on different Caprin1 chains in the condensed phase (using the [Na+] = 0 case as an example). The newly compiled data is reported under a new subsection heading “Explicit-ion MD offers insights into counterion-mediated interchain bridging interactions among condensed Caprin1 molecules” on p. 12 (last five lines)–p. 14 (first 10 lines) [∼ 1_._5 additional page] as well as a new Fig. 6 to depict the statistics of various Arg+–Cl−–Arg+ configurations, with the conclusion that a vast majority (at least 87%) of Cl− counterions in the Caprin1-condensed phase engage in favorable condensation-driving interchain bridging interactions.

      (4) The authors presented ATP-Mg both as a single ion and as two separate ions; there is no explanation of which of the two versions reflects reality. When presenting ATP-Mg as a single ion, it’s as though it forms a salt with Na+. I assume NaCl, ATP, and MgCl2 were used in the experiment. Why is Cl- not considered? Related to this point, it looks like ATP is just another salt ion studied and much of the Results section is on NaCl, so the emphasis of ATP (“Diverse Roles of ATP” in the title is somewhat misleading.

      We model ATP and ATP-Mg both as single-bead ions (in rG-RPA) and also as structurally more realistic short multiple-bead polymers (in field-theoretic simulation, FTS). We have now added discussions to clarify our modeling rationale in using and comparing different models for ATP and ATP-Mg, as follows:

      p. 8 (lines 19–36):

      “The complementary nature of our multiple methodologies allows us to focus sharply on the electrostatic aspects of hydrolysis-independent role of ATP in biomolecular condensation by comparing ATP’s effects with those of simple salt. Here, Caprin1 and pY-Caprin1 are modeled minimally as heteropolymers of charged and neutral beads in rG-RPA and FTS. ATP and ATP-Mg are modeled as simple salts (singlebead ions) in rG-RPA whereas they are modeled with more structural complexity as short charged polymers (multiple-bead chains) in FTS, though the latter models are still highly coarse-grained. Despite this modeling difference, rG-RPA and FTS both rationalize experimentally observed ATP- and NaCl-modulated reentrant LLPS of Caprin1 and a lack of a similar reentrance for pY-Caprin1 as well as a prominent colocalization of ATP with the Caprin1 condensate. Consistently, the same contrasting trends in the effect of NaCl on Caprin1 and pY-Caprin1 are also seen in our coarse-grained MD simulations, though polymer field theories tend to overestimate LLPS propensity [99]. The robustness of the theoretical trends across different modeling platforms underscores electrostatics as a significant component in the diverse roles of ATP in the context of its well-documented ability to modulate biomolecular LLPS via hydrophobic and π-related effects [63, 65, 67].”

      Here, the last sentence quoted above addresses this reviewer’s question about our intended meaning in referring to “diverse roles of ATP” in the title of our paper. To make this point even clearer, we have also added the following sentence to the Abstract (p. 2, lines 12–13):

      “... The electrostatic nature of these features complements ATP’s involvement in π-related interactions and as an amphiphilic hydrotrope, ...”

      Moreover, to enhance readability, we have now added pointers in the rG-RPA part of our paper to anticipate the structurally more complex ATP and ATP-Mg models to be introduced subsequently in the FTS part, as follows:

      p. 9 (lines 13–15):

      “As mentioned above, in the present rG-RPA formulation, (ATP-Mg)<sup>2−</sup> and ATP<sup>4−</sup> are modeled minimally as a single-bead ion. They are represented by charged polymer models with more structural complexity in the FTS models below.”

      p. 11 (lines 8–11):

      These observations from analytical theory will be corroborated by FTS below with the introduction of structurally more realistic models of (ATP-Mg) <sup>2−</sup>, ATP<sup>4−</sup> together with the possibility of simultaneous inclusion of Na<sup>+</sup>, Cl−, and Mg<sup>2+</sup> in the FTS models of Caprin1/pY-Caprin1 LLPS systems.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Lin and colleagues aim to understand the role of different salts on the phase behavior of a model protein of significant biological interest, Caprin1, and its phosphorylated variant, pY-Caprin1. To achieve this, the authors employed a variety of methods to complement experimental studies and obtain a molecular-level understanding of ion partitioning inside biomolecular condensates. A simple theory based on rG-RPA is shown to capture the different salt dependencies of Caprin1 and pY-Caprin1 phase separation, demonstrating excellent agreement with experimental results. The application of this theory to multivalent ions reveals many interesting features with the help of multicomponent phase diagrams. Additionally, the use of CG model-based MD simulations and FTS provides further clarity on how counterions can stabilize condensed phases.

      Strengths:

      The greatest strength of this study lies in the integration of various methods to obtain complementary information on thermodynamic phase diagrams and the molecular details of the phase separation process. The authors have also extended their previously proposed theoretical approaches, which should be of significant interest to other researchers. Some of the findings reported in this paper, such as bridging interactions, are likely to inspire new studies using higher-resolution atomistic MD simulations.

      Weaknesses:

      The paper does not have any major issues.

      We are very encouraged by this reviewer’s positive assessment of our work.

      Reviewer #3 (Public Review):

      Authors first use rG-RPA to reproduce two observed trends. Caprin1 does not phase separate at very low salt but then undergoes LLPS with added salt while further addition of salt reduces its propensity to LLPS. On the other hand pY-Caprin1 exhibits a monotonic trend where the propensity to phase separate decreases with the addition of salt. This distinction is captured by a two component model and also when salt ions are explicitly modeled as a separate species with a ternary phase diagram. The predicted ternary diagrams (when co and counter ions are explicitly accounted for) also predict the tendency of ions to co-condense or exclude proteins in the dense phase. Predicted trends are generally in line with the measurement for Cparin1 [sic]. Next, the authors seek to explain the observed difference in phase separation when Arginines are replaced by Lysines creating different variants. In the current rG-RPA type models both Arginine (R) and Lysine (K) are treated equally since non-electrostatic effects are only modeled in a meanfield manner that can be fitted but not predicted. For this reason, coarse grain MD simulation is suitable. Moreover, MD simulation affords structural features of the condensates. They used a force field that is capable of discriminating R and K. The MD predicted degrees of LLPS of these variants again is consistent with the measurement. One additional insight emerges from MD simulations that a negative ion can form a bridge between two positively charged residues on the chain. These insights are not possible to derive from rG-RPA. Both rG-RPA and MD simulation become cumbersome when considering multiple types of ions such as Na, Cl, [ATP] and [ATP-Mg] all present at the same time. FTS is well suited to handle this complexity. FTS also provides insights into the co-localization of ions and proteins that is consistent with NMR. By using different combinations of ions they confirm the robustness of the prediction that Caprin1 shows salt-dependent reentrant behavior, adding further support that the differential behavior of Caprin1, and pY-Caprin1 is likely to be mediated by charge-charge interactions.

      We are encouraged by this reviewer’s positive assessment of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Analysis:

      Analyze the simulation results to distinguish net charge neutralization and interchain bridging; see MacAinsh et al.

      Please see response above to points (3) and (4) under “Weaknesses” in this reviewer’s public review. We have now added a 1.5-page subsection starting from the bottom of p. 12 to the top of p. 14 to discuss a new extensive analysis of Arg<sup>+</sup>–Cl<sup>−</sup>–Arg<sup>+</sup> configurations to identify bridging interactions, with key results reported in a new Fig. 6 (p. 42). Recent results from MacAinsh, Dey & Zhou (cited now as ref. 72) are included in the added discussion. Relevant advances made in MacAinsh et al., including clarification and classification of salt-mediated interactions in the phase separation of A1-LCD are now mentioned multiple times in the revised manuscript (p. 5, lines 19–20; p. 6, lines 2–5; p. 11, line 30; p. 14, line 10; p. 18, lines 28–29; and p. 20, line 4).

      Writing and presentation

      (1) Cite subtle effects that may be missed by the coarser approaches in this study

      Please see response above to point (1) under “Weaknesses” in this reviewer’s public review.

      (2) Try to distill the findings into a simple set of conclusions

      Please see response above to point (2) under “Weaknesses” in this reviewer’s public review.

      (3) Clarify and simplify physical interpretations

      Please see response above to point (2) under “Weaknesses” in this reviewer’s public review.

      (4) Explain the treatment of ATP-Mg as either a single ion or two separate ions; reconsider modifying the reference to ATP in the title

      Please see response above to point (4) under “Weaknesses” in this reviewer’s public review.

      (5) Minor points:

      p. 4, citation of ref 56: this work shows ATP is a driver of LLPS, not merely a regulator (promotor or suppressor)

      This citation to original ref. 56 (now ref. 63) on p. 4 is now corrected (bottom line of p. 4).

      p. 7 and throughout: “using bulk [Caprin1]” – I assume this is the initial overall Caprin1 concentration. It would avoid confusion to state such concentrations as “initial” or “initial overall”

      We have now added “initial overall concentration” in parentheses on p. 8 (line 4) to clarify the meaning of “bulk concentration”.

      p. 7 and throughout: both mM (also uM) and mg/ml have been used as units of protein concentration and that can cause confusion. Indeed, the authors seem to have confused themselves on p. 9, where 400 (750) mM is probably 400 (750) mg/ml. The same with the use of mM and M for salt concentrations (400 mM Mg2+ but 0.1 and 1.0 M Na+)

      Concentrations are now given in both molarity and mass density in Fig. 1 (p. 37), Fig. 2 (p. 38), Fig. 4 (p. 40), and Fig. 7 (p. 43), as noted in the text on p. 8 (lines 4–5). Inconsistencies and errors in quoting concentrations are now corrected (p. 10, line 18, and p. 11, line 2).

      p. 7, “LCST-like”: isn’t this more like a case of a closed coexistence curve that contains both UCST and LCST?

      The discussion on p. 8 around this observation from Fig. 1d is now expanded, including alluding to the theoretical possibility of a closed co-existence curve mentioned by this reviewer, as follows:

      “Interestingly, the decrease in some of the condensed-phase [pY-Caprin1]s with decreasing T (orange and green symbols for ≲ 20◦C in Fig. 1d trending toward slightly lower [pY-Caprin1]) may suggest a hydrophobicity-driven lower critical solution temperature (LCST)-like reduction of LLPS propensity as temperature approaches ∼ 0◦C as in cold denaturation of globular proteins [7,23] though the hypothetical LCST is below 0◦C and therefore not experimentally accessible. If that is the case, the LLPS region would resemble those with both an UCST and a LCST [4]. As far as simple modeling is concerned, such a feature may be captured by a FH model wherein interchain contacts are favored by entropy at intermediate to low temperatures and by enthalpy at high temperatures, thus entailing a heat capacity contribution in χ(T), with [7,109,110] beyond the temperature-independent ϵ<sub>h</sub> and ϵ<sub>s</sub> used in Fig. 1c,d and Fig. 2. Alternatively, a reduction in overall condensed-phase concentration can also be caused by formation of heterogeneous locally organized structures with large voids at low temperatures even when interchain interactions are purely enthalpic (Fig. 4 of ref. [111]).”

      p. 8 “Caprin1 can undergo LLPS without the monovalent salt (Na+) ions (LLPS regions extend to [Na+] = 0 in Fig. 2e,f”: I don’t quite understand what’s going on here. Is the effect caused by a small amount of counterion (ATP-Mg) that’s calculated according to eq 1 (with z s set to 0)?

      The discussion of this result in Fig. 2e,f is now clarified as follows (p. 10, lines 8–14 in the revised manuscript):

      “The corresponding rG-RPA results (Fig. 2e–h) indicate that, in the present of divalent counterions (needed for overall electric neutrality of the Caprin1 solution), Caprin1 can undergo LLPS without the monvalent salt (Na+) ions (LLPS regions extend to [Na+] = 0 in Fig. 2e,f; i.e., ρs \= 0, ρc > 0 in Eq. (1)), because the configurational entropic cost of concentrating counterions in the Caprin1 condensed phase is lesser for divalent (zc \= 2) than for monovalent (zc \= 1) counterions as only half of the former are needed for approximate electric neutrality in the condensed phase.”

      p. 9 “Despite the tendency for polymer field theories to overestimate LLPS propensity and condensed-phase concentrations”: these limitations should be mentioned earlier, along with the very high concentrations (e.g., 1200 mg/ml) in Fig. 2

      This sentence (now on p. 11, lines 11–18) is now modified to clarify the intended meaning as suggested by this reviewer:

      “Despite the tendency for polymer field theories to overestimate LLPS propensity and condensed-phase concentrations quantitatively because they do not account for ion condensation [99]—which can be severe for small ions with more than ±1 charge valencies as in the case of condensed [Caprin1] ≳ 120 mM in Fig. 2i–l, our present rG-RPA-predicted semi-quantitative trends are consistent with experiments indicating “

      In addition, this limitation of polymer field theories is also mentioned earlier in the text on p. 6, lines 30–31.

      Reviewer #2 (Recommendations For The Authors):

      (1) he current version of the paper goes through many different methodologies, but how these methods complement or overlap in terms of their applicability to the problem at hand may not be so clear. This can be especially difficult for readers not well-versed in these methods. I suggest the authors summarize this somewhere in the paper.

      As mentioned above in response to Reviewer #1, we have now added a subsection with heading “Overview of key observations from complementary approaches” at the beginning of the “Results” section on p. 6 (lines 18–37) and the first line of p. 7 to make our paper more accessible to readers who might not be well-versed in the various theoretical and computational techniques. A few sentences to summarize our key results are added as well to the first paragraph of “Discussion” (p. 18, lines 23–26).

      (2) It wasn’t clear if the authors obtained LCST-type behavior in Figure 1d or if another phenomenon is responsible for the non-monotonic change in dense phase concentrations. At the very least, the authors should comment on the possibility of observing LCST behavior using the rG-RPA model and if modifications are needed to make the theory more appropriate for capturing LCST.

      As mentioned above in response to Reviewer #1, the discussion regarding possible LCSTtype behanvior in Fig. 1d is now expanded to include two possible physical origins: (i) hydrophobicity-like temperature-dependent effective interactions, and (ii) formation of heterogeneous, more open structures in the condensed phase at low temperatures. Three additional references [109, 110, 111] (from the Dill, Chan, and Panagiotopoulos group respectively) are now included to support the expanded discussion. Again, the modified discussion is as follows:

      “Interestingly, the decrease in some of the condensed-phase [pY-Caprin1]s with decreasing T (orange and green symbols for ≲ 20◦C in Fig. 1d trending toward slightly lower [pY-Caprin1]) may suggest a hydrophobicity-driven lower critical solution temperature (LCST)-like reduction of LLPS propensity as temperature approaches ∼ 0◦C as in cold denaturation of globular proteins [7,23] though the hypothetical LCST is below 0◦C and therefore not experimentally accessible. If that is the case, the LLPS region would resemble those with both an UCST and a LCST [4]. As far as simple modeling is concerned, such a feature may be captured by a FH model wherein interchain contacts are favored by entropy at intermediate to low temperatures and by enthalpy at high temperatures, thus entailing a heat capacity contribution in χ(T), with [7,109,110] beyond the temperature-independent ϵ<sub>h</sub> and ϵ<sub>s</sub> used in Fig. 1c,d and Fig. 2. Alternatively, a reduction in overall condensed-phase concentration can also be caused by formation of heterogeneous locally organized structures with large voids at low temperatures even when interchain interactions are purely enthalpic (Fig. 4 of ref. [111]).”

      (3) In Figures 4c and 4d, ionic density profiles could be shown as a separate zoomed-in version to make it easier to see the results.

      This is an excellent suggestion. Two such panels are now added to Fig. 4 (p. 40) as parts (g) and (h).

      Reviewer #3 (Recommendations For The Authors):

      I would suggest authors make some minor edits as noted here.

      (1) Please note down the chi values that were used when fitting experimental phase diagrams with rG-RPA theory in Figure 2a,b. At present there aren’t too many such values available in the literature and reporting these would help to get an estimate of effective chi values when electrostatics is appropriately modeled using rG-RPA.

      The χ(T) values and their enthalpic and entropic components ϵh and ϵs used to fit the experimental data in Fig. 1c,d are now stated in the caption for Fig. 1 (p. 37). Same fitted χ(T) values are used in Fig. 2 (p. 38) as it is now stated in the revised caption for Fig. 2. Please note that for clarity we have now changed the notation from ∆h and ∆s in our originally submitted manuscript to ϵh and ϵs in the revised text (p. 7, last line) as well as in the revised figure captions to conform to the notation in our previous works [18, 71].

      (2) Authors note “monovalent positive salt ions such as Na+ can be attracted, somewhat counterintuitively, into biomolecular condensates scaffolded by positively-charged polyelectrolytic IDRs in the presence of divalent counterions”. This may be due to the fact that the divalent negative counterions present in the dense phase (as seen in the ternary phase diagrams) also recruit a small amount of Na+.

      The reviewer’s comment is valid, as a physical explanation for this prediction is called for. Accordingly, the following sentence is added to p. 10, lines 27–29:

      “This phenomenon arises because the positively charge monovalent salt ions are attracted to the negatively charged divalent counterions in the protein-condensed phase.”

      (3) In the discussion where authors contrast the LLPS propensity of Caprin1 against FUS, TDP43, Brd4, etc, they correctly note majority of these other proteins have low net charge and possibly higher non-electrostatic interaction that can promote LLPS at room temperature even in the absence of salt. It is also worth noting if some of these proteins were forced to undergo LLPS with crowding which is sometimes typical. A quick literature search will make this clear.

      A careful reading of the work in question (Krainer et al., ref. 50) does not suggest that crowders were used to promote LLPS for the proteins the authors studied. Nonetheless, the reviewer’s point regarding the potential importance of crowder effects is well taken. Accordingly, crowder effects are now mentioned briefly in the Introduction (p. 4, line 13), with three additional references on the impact of crowding on LLPS added [30–32] (from the Spruijt, Mukherjee, and Rakshit groups respectively). In this connection, to provide a broader historical context to the introductory discussion of electrostatics effects in biomolecular processes in general, two additional influential reviews (from the Honig and Zhou groups respectively) are now cited as well [15, 16].

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript explores the multiple cell types present in the wall of murine-collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. 

      Strengths: 

      The experiments are rigorously performed, the data justify the conclusions, and the limitations of the study are appropriately discussed. 

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention. 

      Weaknesses: 

      My only major comment would be that the manuscript provides a lot of rich information describing the cellular components of the muscular lymphatic vessel wall and that these data are not well represented by the title. The title (while currently accurate) could be tweaked to better represent all that is in this manuscript. Maybe something like

      "Characterization/Interrogation of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions" or "Discovery/Confirmation of lymphatic muscle cells as innate pacemaker cells of lymphatic contraction through characterization of the cellular components of murine collecting lymphatic vessels". Potentially a cartoon summary figure of the components that make up the collecting lymphatic vessel wall could also be included. In my opinion, these changes will make this manuscript of more interest to a broader group of scientists. I have a few additional comments for consideration to improve the clarity and enhance the discussion of this work. 

      We agree with the reviewer that our original manuscript, and our resubmission even more so with the addition of the scRNAseq data, provides a significant amount of information regarding the composition of the lymphatic collecting vessel wall. We have changed our title to match one suggestion of the reviewer: “Characterization of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions".

      Reviewer #2 (Public Review): 

      Summary: 

      This is a well-written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine-collecting lymphatics. Using state-of-the-art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics. 

      Strengths: 

      The use of a targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels. 

      Weaknesses: 

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular, the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. 

      We understand the reviewer’s concern regarding the lack of a control for the colocalization analysis and that the colocalization analysis was limited to just one set of cell markers. We have now provided a colocalization analysis of Myh11 and PDGFRα, to serve as a co-localization negative control based on our RT-PCR and scRNASeq findings, which is incorporated into the current Supplemental figure 1. In regard to the staining pattern of other various marker combinations, the results were often quite clear with the representative images that two separate cell populations were being stained such as the case with labeling endothelial cells with CD31, macrophage labeling with the MacGreen mice, or hematopoietic cells with CD45. 

      During our lengthy rebuttal process we completed a single cell RNA sequence analysis using our isolated and cleaned mouse inguinal axillary lymphatic collecting vessels to aid in our characterization of the vessel wall and to more thoroughly answer these questions regarding colocalization in arguably a robust manner. The generation of our scRNAseq dataset, derived from isolated and cleaned mouse inguinal axillary collecting vessels from 10 mice, 5 male and 5 females, allowed us to profile over 2200 of the adventitial fibroblast like cells (AdvCs) we had identified in our original submission. Using this dataset, we were able to confirm co-expression of Cd34 and Pdgfrα in AdvCs and assess the co-expression of other genes of interest from our RT-PCR experiments and immunofluorescence experiments. This approach will also allow other lymphatic investigators to assess their genes of interest as our dataset is uploaded to the NIH Gene Omnibus and will be uploaded to the Broad Institute Single Cell Portal upon publication.

      Here we show that the vast majority of non-muscle fibroblast like cells referred to as AdvCs were double positive for both CD34 and PDGFRα. We also show that the AdvCs that express commonly used pericyte markers Pdgfrb and Cspg4 also co-expressed Pdgfrα. Critically, this data also shows that the AdvCs that express genes linked with lymphatic contractile dysfunction (Ano1, Gjc1 or connexin 45, and Cacna1c “Cav1.2”) co-express Pdgfrα and would render these genes susceptible to Cre-mediated recombination using our Pdgfrα-CreER<sup>TM</sup> model.  

      Reviewer #3 (Public Review): 

      Summary: 

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogenetic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels. 

      Strengths: 

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.   

      Weaknesses: 

      -  More quantitative measurements. 

      -  Possible mechanisms associated with the pacemaker activity. 

      -  Membrane potential measurements. 

      We thank the reviewers for their concerns and have addressed them in the following manner. 

      - We added novel single cell RNA sequencing of isolated and cleaned inguinal axillary vessels from 10 mice (5 males and 5 females). This allowed us to quantify the number of AdvCs that coexpress CD34 and Pdgfrα as well as the number of cells co-expressing Pdgfrα and other markers.

      - We have added a negative control with quantification for the co-localization analysis assessing Myh11 and Pdgfrα. We have added a negative control with quantification for the ChR2-photo stimulated contraction experiments using Myh11CreERT2-ChR2 mice that were not injected with tamoxifen. 

      - We also used Biocytin-AF488 in our intracellular Vm electrodes to map the specific cells in which we recorded action potentials and in neighboring cells since Biocytin-AF488 is under 1KDa and can pass through gap junctions. This approach independently labeled lymphatic muscle cells and their direct neighbors for 3 IALVs from 3 separate mice. 

      - We performed membrane potential recordings in isolated, pressurized (under isobaric conditions), and spontaneously contracting inguinal axillary lymphatic collecting vessels at different pressures. 

      - We also show that the pressure-frequency relationship is dependent on the slope of the diastolic depolarization as no other parameter was significantly altered in our study and the diastolic depolarization slope was highly correlated with contraction frequency. 

      We believe the addition of these novel data, controls, experiments, and quantifications have improved the manuscript in line with the reviewers’ suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Lines 149-162: The authors rule out the methylene blue staining cells in the cLV wall as pacemakers because they don't form continuous longitudinal connections to drive propagation. Is it possible for a pacemaker cell to only initiate the contraction and then have the LMCs make the axial electrical connections to propagate the electrical wave? I am not trying to suggest the methylene blue cells are pacemakers, but I am not sure the lack of longitudinal (or radial) connectivity is sufficient evidence to rule out the possibility. This comment also is relevant to the 3 criteria for a pacemaker cell listed in the Discussion (Lines 413-417). 

      We agree with the reviewer’s broader point that a pacemaker cell may not require direct contact with other ‘pacemaker’ cells within the tissue as long as they are still within the same electrical syncytium. However, we do expect a continuous presence of a pacemaker cell type throughout the vessel wall length to account for the persistence of spontaneous contractile behavior despite vessel length, and the ability for contraction initiation to shift (Akl et al 2011, Castorena et al 2018 and Castorena et al 2022) and the occurrence of spontaneous action potentials. In Dirk van Helden’s seminal work in 1993 on lymphatic pacemaking, a major finding was that “SM of small lymphangions or that of short segments, cut from lymphangions of any length, behaved similarly”. We have adjusted our phrase regarding the requirement of a contiguous network and instead suggest a continuous presence along the vessel network and integrated into the electrical syncytium. 

      Methylene blue is an alkaline stain that will stain acidic structures and historically methylene blue is noted to stain Interstitial cells of Cajal in the gastrointestinal tract which typically exist as network of cells(Huizinga et al 1993 and Berezin 1988). No such network was readily apparent in our methylene blue staining nor did the stained cells have a similar morphology to the ICCs of the gastrointestinal tract. Further, methylene blue is staining is not limited to ICCs or pacemaker cells at large as it has been used to kill cancer cells. Within the small intestine methylene blue was noted to also stain macrophage like cells (Mikkelsen et al 1988), and we too draw parallels between the macrophage morphology observed with Macgreen mice and methylene-blue stained cells. The specific structure for the ICC affinity for methylene blue is not well described and while the innate cytotoxicity of methylene blue and light has been used to kill ICCs and impair slow wave generation, the lack of specificity of this method leaves much to be desired. What is clear is that the ICC network highlighted by methylene blue in the gut is absent in lymphatic collecting vessels.

      In Figure 15/Video12, is it possible that the cells that are showing intracellular Ca2+ in diastole are the cells that reach a threshold membrane potential that then trigger the rest of the LMCs? As the authors have shown heterogeneity in the LMCs surface markers, is it possible that the cells with Ca2+ activity during diastole are identifiable by a distinct molecular phenotype? Or is the thought that these cells are randomly active in diastole? Some discussion/speculation about this seems appropriate. 

      We are in agreement with the reviewer’s conclusion that there is heterogeneity in the LMCs as it pertains to the calcium oscillations in diastole, either under normal buffer conditions or when L-type channels are inhibited with nifedipine. We also note significant heterogeneity in the gene expression noted within the four LMC subclusters (0-3), though we did not see significant differences in either in Ip3R1 or Ano1 expression. However, subcluster “0” had increased expression of Itprid2, also known as KRas-induced actin-interacting protein (KRAP) which is thought to tether, and thus immobilize, IP3 receptors to the actin cortex beneath the cell membrane. KRAP has been recently proposed to be a critical player in IP3 receptor “licensing” which allows IP3 receptors to release calcium (Vorontsova et al., 2022).  However, whether similar requirement of IP3R licensing is necessitated in all cells or specifically in LMCs is unknown it is quite clear there are specific release sites within the cell and this topic is currently under further investigation for a separate manuscript. We would like to note that there is yet to be a clear consensus on whether IP3R licensing is required as much of these studies are performed in cultured cells and this mechanism has only recently been described. 

      Healthy lymphatic collecting vessels typically have a single pacemaker driving a coordinated propagated contraction in ex vivo isobaric myograph studies (Castorena-Gonzalez et al., 2018), which is typically at either end of the cannulated vessel. We believe that this is due to the lack of a bordering cell in one direction and allows charge to accumulate and voltage to reach threshold at these sites preferentially. We have tried to image calcium at the pacemaking pole of the vessel to observe the specific Ca<sup>2+</sup> transients at these sites though invariably the act of imaging GCaMP6f results in the pacemaker activity initiating from the other pole of the vessel. It is our opinion that the fact that LMCs are heterogenous in their Ca<sup>2+</sup> transients is a feature to the system as it permits a wider range of depolarization signals, and thus allows finer control of the pacing as different physical/pressure or signaling stimuli is encountered. Likely, the cells with the higher propensity of Ca<sup>2+</sup> transients act as the contraction initiation site in vivo, though it must also be noted that the LMC density decreases around lymphatic valve sites. In fact, in guinea pig collecting vessels there are very few LMCs at the valves which can render them electrically uncoupled or poorly coupled (Van Helden, 1993). Thus, valve sites in which there is greater electrical resistance due to lower LMC-LMC coupling may allow for charge accumulation in the LMCs at the valve site, similar to the artificial condition achieved in our myograph preparations with two cut ends, and allow them to reach threshold first and drive coordination at the valve sties.

      An additional description of what the PTCL analysis is meant to represent physiologically would be helpful for readers. 

      We have better described the conversion of the calcium signals into “particles” for analysis at first mention in the methods and results section and have included the requisite reference to this specific methodology in Line 429-30. 

      A description of how DMAX is experimentally determined is needed. 

      We have adjusted our methods section to describe DMAX in line 774-775.

      “with Ca<sup>2+</sup>-free Krebs buffer (3mM EGTA) and diameter at each pressure recorded under passive conditions (DMAX).”

      I think the vessels referred to as popliteal lymphatic vessels are actually saphenous lymphatic vessels (afferent to the popliteal lymph node). Please clarify. 

      Indeed, some of the vessels used in this study are the afferents to the single popliteal node. They travel with the caudal branch of the saphenous vein, but have routinely been described as popliteal vessels, as opposed to saphenous lymphatic vessels, by the lymphatic field at large (Tilney 1971 PMCID: PMC1270981, Liao 2015 PMID: 25512945). To move away from this nomenclature would likely add to confusion although we agree that the lymphatic field may need to improve or correct the vessel naming paradigm to match the vascular pairs they follow.

      Reviewer #2 (Recommendations For The Authors): 

      Lines 214-215 - can you cite a reference for the observation that rhythmic contractions don't require the presence of valves? 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 224-230 - It would have been nice to see colocalization analysis for all cell types so that "negative" results could be compared with the "positives" that you report. This would help bolster evidence of your ability to separate cell types. 

      We understand the reviewer’s sentiment and agree. We have now added a “negative control” colocalization staining and analysis for PDGFR and Myh11 which has been added to the current SuppFigure 1. We stained 3 IALVs from 3 separate mice with PDGFRα and Myh11 and performed confocal microscopy. We ran the FIJI BIOP-JACOP colocalization plugin as before and observed very little colocalization of the two signals. Additionally, we have also added a coexpression assessment for CD34 and PDGFRα and other genes using our scRNAseq dataset.  

      line 293 - Should read "Cx45 in..." 

      This has been corrected. 

      “The expression of the genes critically involved in cLV function—Cav1.2, Ano1, and Cx45—in the PdgfrαCreER<sup>TM</sup>-ROSA26mTmG purified cells and scRNAseq data prompted us to generate PdgfrαCreER<sup>TM</sup>-Ano1<sup>fl/fl</sup>, PdgfrαCreER<sup>TM</sup>-Cx45<sup>fl/fl</sup>, and PdgfrαCreER<sup>TM</sup>-Cav1.2<sup>fl/fl</sup> mice for contractile tests.”

      lines 470-473 - A reference for this statement should be cited. 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 483-487 - References should be cited for these statements. 

      We have narrowed and clarified this statement and supported it with the necessary citations. 

      “Of course, mesenchymal stromal cells (Andrzejewska et al., 2019) and fibroblasts (Muhl et al., 2020; Buechler et al., 2021; Forte et al., 2022) are present, and it remains controversial to what extent telocytes are distinct from or are components/subtypes of either cell type (Clayton et al., 2022). Telocytes are not monolithic in their expression patterns, displaying both organ directed transcriptional patterns as well as intra-organ heterogeneity (Lendahl et al., 2022) as readily demonstrated by recent single cell RNA sequencing studies that provided immense detail about the subtypes and activation spectrum within these cells and their plasticity (Luo et al., 2022).”

      Lines 584-585 - Missing a reference citation. 

      Thank you for catching this error, the correct citation was for Boedtkjer et al 2013 and is now properly cited. 

      Line 638 - "these this" should read "this" 

      Thank you for catching this error. This particular sentence was removed in light of the addition of the scRNAseq data.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript from Zawieja et al. explored an interesting hypothesis about the pacemaker cells in lymphatic collecting vessels. Many aspects of lymphatic collecting vessels are still under investigation; hence this work provides timely knowledge about the lymphatic muscle cells as a pacemaker. Although it is an important topic of the investigation, the data provided do not support the overall goal of the manuscript. Many figures (Figure 1-5) provide quantitative estimation and the description provided in the results section might only be useful for a restricted audience, but not to the broader audience. Some of the figures are very condensed with multiple imaging panels and it is hard to follow the differences in qualitative analysis. Overall, this manuscript can be improved by more streamlined description/writing and figure arrangements (some of the figures/panels can be moved to the supplementary figures). 

      We disagree with the notion that the original data provided did not support the goal of the manuscript- to identify and test putative pacemaker cell types. Nonetheless we believe we have also added ample novel data to the manuscript, including membrane potential recordings and scRNAseq to highlight and to add further support to our conclusion that the pacemaker cell is an LMC. We believe the scRNAseq data will also greatly enhance the appeal of the manuscript to a broader audience and have renamed the manuscript in line with the wealth of data we have collected on the components of the vessel wall as we tested for putative pacemaker cells.

      As requested, we have moved many figures to the supplement to allow readers to focus more on the more critical experiments.

      A few other points that need to be addressed: 

      (1) Authors used immunofluorescence-based differences in various cell types in the collecting vessels. Initially, they chose ICLC, pericytes, and lymphatic muscle cells. But then they started following adventitial cells and endothelial cells. It is not clear from the description, why these other cells could be possibly involved in the pacemaker activity. It will be easier to follow if authors provide a graphical abstract or summary figure about their hypothesis and what is known from their and others' work. 

      We would like to clarify that we used the endothelial cells as controls to ensure what we observed via immunofluorescence and FACs RT-PCR were a separate cell type from either lymphatic muscle or lymphatic endothelial cells on the vessel wall. Staining for the endothelium also allowed us to assess where these PDGFRα+CD34+ cells reside in the vessel wall.  We started with a wide range of markers that are conventionally used for targeting specific cell types, but as expected those markers are not always 100% specific. Specifically, we focused on CD34, Kit, and Vimentin as those were the markers for the non-muscle cells observed in the lymphatic collecting vessel wall previously. What we found was that CD34 and PDGFRα labeled the same cell type. As there was not a CD34Cre mouse available at the time we instead utilized the inducible PDGFRαCreERTM. We are unsure how well an abstract figure will condense the conclusions from the experiments listed here but if absolutely required for publication we can attempt to highlight the representative cell populations identified on the vessel wall.

      (2) Authors used many acronyms in the manuscript without defining them (when they appeared for the first time). Please follow the convention. 

      We have checked the manuscript and made several corrections regarding the use of abbreviations.

      (3) How specific PDGFR-alpha as a marker of the pericytes? It can also label the mesenchymal cells. Why did the author choose PDGFR-alpha over beta for their Cre-based expression approach? 

      We tried to assess if there were a pericyte like cell present in or along the wall using PDGFRbeta (Pdgfrβ). Pdgfrβ is commonly used to identify pericytes (Winkler et al., 2010), while in contrast Pdgfrα is a known fibroblast marker (Lendahl et al., 2022). Pdgfrβ CreERT2 resulted in recombination in both LMCs and AdvCs, preventing it from being a discriminating marker for our study where as Myh11CreER<sup>T2</sup> and PDGFRαCreER<sup>TM</sup> were specific at least to cell type based on our FACSs-RT-PCR and staining. As you can tell from the scRNAseq data in Figure 5, there was no cell cluster that Pdgfrβ was specific for in contrast to PDGFRα and Myh11.  In Figure 6 we show the expression of another commonly used pericyte marker NG2 (Cspg4) in our scRNAseq dataset which was observed in both LMCs and AdvCs as well. Lastly, MCAM (Figure 6) can also be a marker for pericytes though we see only expression in the LMCs and LECs for this marker. Notably, almost all of the AdvCs express PDGFRα rendering the PDGFRαCreER<sup>TM</sup> a powerful tool to study this population of cells on the vessel wall including those that were PDGFRα+Cspg4+ or PDGFRα+ Pdgfrβ+.

      We were reliant on PDGFRαCreER<sup>TM</sup> as that was the only available PDGFRα Cre model at the time. Note we used PdgfrβCreER<sup>T2</sup> and Ng2Cre in our study but found that both Cre models recombined both LMCs and AdvCs.

      (4) Please include appropriate references for all the labeling markers (PDGFR-alpha, beta, and myc11 etc.) that are used in this manuscript. 

      We have added multiple references to the manuscript to support the use of these common cell “specific” markers as of course each marker is limited in some capacity to fully or specifically label a single population of cells (Muhl et al., 2020).

      (5) One of the criteria for the pacemaker cells is depolarization-induced propagated contractions. Authors have used optogenetics-induced depolarization to test this phenomenon. Please include negative controls for these experiments. 

      We have now added negative controls to this experiment which were non-induced (no tamoxifen) Myh11CreER<sup>T2</sup>-Chr2 popliteal vessels. This data has been added to the Figure 8.  

      (6) What are the resting membrane potentials of Lymphatic muscle cells? The authors should provide some details about this in the manuscript. 

      We agree with the reviewer and have added membrane potential recordings (Figure 13) at different pressures and filled our recording electrode with the cell labeling molecule BiocytinAF488 to highlight the action potential exhibiting cells, which were the LMCs. Lymphatic resting membrane potential is dynamic in pressurized vessels, which appears to be a critical difference in this approach as compared to pinned out vessels or those on wire myographs likely due to improper stretch or damage to the vessel wall. In mesenteric lymphatic vessels isolated from rats the minimum membrane potential achieved during repolarization ranges from -45 to 50mV typically while IALVs from mice are typically around -40mV, though IALVs have a notably higher contraction frequency. Critically, we have also added novel membrane potential recordings to this manuscript in IALVs at different pressures and show that the diastolic depolarization rate is the critical factor driving the pressure-dependent frequency.

      (7) In the discussion, the authors discussed SR Ca2+ cycling in Pacemaking, but the relevant data are not included in this manuscript, but a manuscript from JGP (in revision) is cross-referenced. 

      As discussed above, we have recently published our work where studied IALVs from Myh11CreERT2-Ip3R1fl/fl (Ip3r1ismKO) and Myh1CreERT2-Ip3r1fl/fl-Ip3r2fl/fl-Ip3r3fl/fl mice (Zawieja et al., 2023). Deletion of Ip3r1 from LMCs recapitulated the dramatic reduction in frequency we previously published in Myh11CreERT2-Ano1fl/fl mice and the loss of pressure dependent chronotropy. Furthermore, in this manuscript we also showed that the diastolic calcium transients are nearly completely lost in ILAVs from Myh11CreERT2-Ip3R1fl/fl knockout mice. There was no difference in the contractile function between IALVs from single Ip3r1 knockout and the triple Ip3r1-3 knockout mice suggesting that it is Ip3r1 that is required for the diastolic calcium oscillations. Further, in the presence of 1uM nifedipine there were still no calcium oscillations in the Myh11CreERT2-Ip3r1fl/fl LMCs. These findings provide further support for our interpretation that the pacemaking is of myogenic origin.

      Andrzejewska, A., B. Lukomska, and M. Janowski. 2019. Concise Review: Mesenchymal Stem Cells: From Roots to Boost. Stem Cells. 37:855-864.

      Buechler, M.B., R.N. Pradhan, A.T. Krishnamurty, C. Cox, A.K. Calviello, A.W. Wang, Y.A. Yang, L.

      Tam, R. Caothien, M. Roose-Girma, Z. Modrusan, J.R. Arron, R. Bourgon, S. Muller, and S.J. Turley. 2021. Cross-tissue organization of the fibroblast lineage. Nature. 593:575579.

      Castorena-Gonzalez, J.A., S.D. Zawieja, M. Li, R.S. Srinivasan, A.M. Simon, C. de Wit, R. de la Torre, L.A. Martinez-Lemus, G.W. Hennig, and M.J. Davis. 2018. Mechanisms of Connexin-Related Lymphedema. Circ Res. 123:964-985.

      Clayton, D.R., W.G. Ruiz, M.G. Dalghi, N. Montalbetti, M.D. Carattino, and G. Apodaca. 2022. Studies of ultrastructure, gene expression, and marker analysis reveal that mouse bladder PDGFRA(+) interstitial cells are fibroblasts. Am J Physiol Renal Physiol. 323:F299F321.

      Forte, E., M. Ramialison, H.T. Nim, M. Mara, J.Y. Li, R. Cohn, S.L. Daigle, S. Boyd, E.G. Stanley, A.G. Elefanty, J.T. Hinson, M.W. Costa, N.A. Rosenthal, and M.B. Furtado. 2022. Adult mouse fibroblasts retain organ-specific transcriptomic identity. Elife. 11.

      Gashev, A.A., M.J. Davis, and D.C. Zawieja. 2002. Inhibition of the active lymph pump by flow in rat mesenteric lymphatics and thoracic duct. J Physiol. 540:1023-1037.

      Lendahl, U., L. Muhl, and C. Betsholtz. 2022. Identification, discrimination and heterogeneity of fibroblasts. Nat Commun. 13:3409.

      Luo, H., X. Xia, L.B. Huang, H. An, M. Cao, G.D. Kim, H.N. Chen, W.H. Zhang, Y. Shu, X. Kong, Z.

      Ren, P.H. Li, Y. Liu, H. Tang, R. Sun, C. Li, B. Bai, W. Jia, Y. Liu, W. Zhang, L. Yang, Y. Peng, L. Dai, H. Hu, Y. Jiang, Y. Hu, J. Zhu, H. Jiang, Z. Li, C. Caulin, J. Park, and H. Xu. 2022. Pancancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat Commun. 13:6619.

      Muhl, L., G. Genove, S. Leptidis, J. Liu, L. He, G. Mocci, Y. Sun, S. Gustafsson, B. Buyandelger, I.V.

      Chivukula, A. Segerstolpe, E. Raschperger, E.M. Hansson, J.L.M. Bjorkegren, X.R. Peng, M. Vanlandewijck, U. Lendahl, and C. Betsholtz. 2020. Single-cell analysis uncovers fibroblast heterogeneity and criteria for fibroblast and mural cell identification and discrimination. Nat Commun. 11:3953.

      Van Helden, D.F. 1993. Pacemaker potentials in lymphatic smooth muscle of the guinea-pig mesentery. J Physiol. 471:465-479.

      Vorontsova, I., J.T. Lock, and I. Parker. 2022. KRAP is required for diffuse and punctate IP(3)mediated Ca(2+) liberation and determines the number of functional IP(3)R channels within clusters. Cell Calcium. 107:102638.

      Winkler, E.A., R.D. Bell, and B.V. Zlokovic. 2010. Pericyte-specific expression of PDGF beta receptor in mouse models with normal and deficient PDGF beta receptor signaling. Mol Neurodegener. 5:32.

      Zawieja, S.D., G.A. Pea, S.E. Broyhill, A. Patro, K.H. Bromert, M. Li, C.E. Norton, J.A. CastorenaGonzalez, E.J. Hancock, C.D. Bertram, and M.J. Davis. 2023. IP3R1 underlies diastolic ANO1 activation and pressure-dependent chronotropy in lymphatic collecting vessels. J Gen Physiol. 155.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study presents valuable observations of white matter organisation from diffusion MRI and two types of synchrotron imaging in both monkeys and mice. Cross-modality comparisons are interesting as the different methods are able to probe anatomical structures at different length scales, from single axons in high-resolution synchrotron (ESRF) imaging, to clusters of axons in lower-resolution synchrotron (DEXY) data, to axon populations at the mm-scale in diffusion MRI. By acquiring all modalities in monkey and mouse ex vivo samples, the authors can observe principles of fibre organisation, and characterise how fibre characteristics, such as tortuosity and micro-dispersion, vary across select brain regions and in healthy tissue versus a demyelination model. The results are solid, though some statements (in the abstract/discussion) do not appear to be fully supported, and statistical tests would help confirm whether tissue characteristics are similar/different between different conditions.

      R1.1: Thank you for the kind feedback. We have included statistical tests in the paper for tissue characteristics where appropriate.

      Due to the very high number of sample points (one per voxel) within the 3D synchrotron volumes, testing for statistical significance is challenging for the structure tensor-based tissue fractional anisotropy (FA) metric. This causes any standard statistical test to have sufficient power to evaluate even minute differences between the volumes as statistically significant with high confidence. In other words, the null hypothesis (H0) will always be rejected with p = 0, regardless of the practical significance of the difference. Therefore, we have not added statistical analysis for FA results.

      For the tractography based metrics, the number of sample points (one per streamline) is not as high as that for the structure tensor FA, thus making it more reasonable to test for statistical significance. The statistical analyses performed included tests for equality of distributions (Two-sample Kolmogorov-Smirnov tests), equality of medians (Two-sided Wilcoxon rank sum tests), and equality of variance (Brown-Forsythe tests). The results are described in relation to Figure 5(B, D), Figure 8(CF), and detailed in the Methods section.

      One very interesting result is the observation of apparent laminar organisation of fibres in ex vivo monkey white matter samples. DESY data from the corpus callosum shows fibres with two dominant orientations (one L-R, one slightly inclined), clustered in laminar structures within this major fibre bundle. Thanks to the authors providing open data, I was able to look through the raw DESY volume and observe regions with different "textures" (different orientations) in the described laminar arrangement. That this organisation can be observed by eye, as well as by structure tensor, is fairly convincing. As not all readers will download the data themselves, the manuscript could benefit from additional figures/videos to demonstrate (1) the quality of the DESY data and (2) a more 3D visualisation of the laminar structures (where the coronal plane shows convincing columnar structure or stripes). Similarly in Figure 5A, though this nicely depicts two populations with different orientations, it is somewhat difficult to see the laminar structure in the current image.

      ESRF data of the centrum semiovale (CS) contributes evidence for similar laminar structures in a crossing fibre region, where primarily AP fibres are shown to cluster in 3 laminar structures. As above, further visualisations of the ESRF volume in the CS (as shown in Figure 4E) would be of value (e.g. showing consistency across the 4 volumes, 2D images showing stripey/columnar patterns along different axes, etc).

      R1.2: Conveying complex 3D geometry through 2D still images is indeed challenging, and we greatly appreciate the reviewer’s comments and suggestions. To better communicate the understanding of the 3D anatomical environments, we have taken the following actions:

      (1) To enhance insights into the tractography results in Figures 5A and 5D, we have rendered and added animations of the tractography scenes as supplemental material.

      (2) To visually support 3D insights concerning the consistency of the laminar organisation of the callosal fibres, we have replaced the 2D slice views in Figures 3A and 3B with 3D renderings similar to the one in Figure 4E.

      (3) An animation of Figure 4E was created to display the colour-coded structure tensor directions of all four stacked scans. This animation visually supports the complexity of the fibre orientation and the layered structural laminar organisation of the CS sample.

      A key limitation of this result is that, though the DESY data from the CC seems convincing, the same structures were not observed in high-resolution synchrotron (ESRF) data of the same tissue sample in the corpus callosum. This seems surprising and the manuscript does not provide a convincing explanation for this inconsistency. The authors argue that this is due to the limited FOV of the ESRF data (~200x200x800 microns). However, the observed laminar structures in DESY are ~40 microns thick, and ERSF data from the CST suggests laminar thicknesses in the range of 5-40 microns with a similar FOV. This suggests the ERSF FOV would be sufficient to capture at least a partial description of the laminar organisation. Further, the DESY data from the CC shows columnar variations along the LR axis, which we might expect to be observed along the long axis of the ESFR volume of the same sample. Additional analyses or explanations to reconcile these apparently conflicting observations would be of value. For example, the authors could consider down-sampling the ESRF data in an appropriate manner to make it more similar to the DESY data, and running the same analysis, to see if the observed differences are related to resolution (i.e. the thinner laminar structures cluster in ways that they look like a thicker laminar structure at lower resolution), or crop the DESY data to the size of the ESRF volume, to test whether the observed differences can be explained by differences in FOV. Laminar structures were not observed in mouse data, though it is unclear if this is due to anatomical differences or somewhat related to differences in data quality across species.

      R1.3: We have clarified and expanded upon the results regarding the laminar organisation observed in the monkey CC DESY data. As noted in R1.2, we replaced the 2D images in Figures 3A (DESY) and 3B (ESRF) with 3D renderings to better display the spatial outline of the laminar organisation in the volumes. The reviewer is correct that, although the smaller field of view (FOV) of the ESRF data should allow us to at least partially capture parts of the laminar organisation observed in the larger FOV of the DESY data, this is not guaranteed. It depends on how the smaller FOV is positioned relative to the structural organisation, and since we lack co-registration, we do not know this. It should now be visually evident that the ESRF FOV can be placed such that it does not cover the observed laminae, a point which is now also emphasised in the Discussion. 

      Secondly, it is important to emphasise that the voxel colouring using the primary structure tensor direction is just a visualisation technique, which has limitations when it comes to assessing laminar organisation. Mapping 3D directions to RGB colours is inherently difficult and will always have ambiguities. If we had used the standard R-G-B to LR-AP-IS colouring in Figure 3, the laminar organisation would not be evident. Additionally, the laminae will only be visible when there are clear angular differences. There can still be a layered organisation even if we don’t observe it, which is the case for the mouse. The primary direction differences of these layers could be very low (i.e., parallel layers), and consequently not visually evident. This point has been clarified in both the Results and Discussion sections.

      Finally, in response to R1.6, we have added analyses regarding the shape of the FOD, specifically estimating the Orientation Dispersion Index (ODI) and Dispersion Anisotropy (DA). This provides further context to the reviewer’s comments about the discrepancies in laminar organisation. We have reflected on the relationship between DA and the visually observed laminar organisation, and this has been integrated into the relevant parts of the Results and Discussion sections.

      The changes to manuscript reflecting the statements above are listed here: 

      The Discussion section (page 21): “In the monkey CC DESY data, which has a field of view (FOV) comparable to a dMRI voxel, a columnar laminar organisation at a macroscopic level was visually revealed from the structure tensor (ST) direction colouring. However, this laminar organisation was not visible in the higher-resolution ESRF data for the same tissue sample. Although the two samples were not co-registered, the size of a single ESRF FOV within the DESY sample is illustrated in Fig. 3A. This demonstrates the possibility of placing the ESRF sample where the observed laminar structure is absent. Consequently, knowledge of the tissue structural organisation and its orientation is important to fully benefit from the stacked FOV of the ESRF sample and when choosing appropriate minimal FOV sizes in future experiments.

      Interestingly, when characterising FODs with measures like ODI and DA as indicators of fibre organisation, rather than relying on visualisation, results from large- and small-FOV data show no discrepancies. This statistical approach discards the spatial context (visually perceived as laminae), highlighting the need to combine both methods.” 

      The Results section (page 8): “The mid-level DA values suggest some anisotropic spread of the directions, reflecting the angled laminar organisation observed in the DESY sample. Interestingly, the DA value for the ESRF sample is almost identical, despite the laminar bands being less visually apparent.”

      The Results section (page 17): “Nevertheless, visualisation of orientations did not reveal any axonal organisation in the mouse CC due to the lack of local angular contrast, unlike the clear laminar structures seen in the monkey sample (Fig. 3A). Any parallel organisation in tissue remains undetectable because our visual contrast relies on angular differences.”

      The Discussion section (page 22): “In the monkey CC (mid-body), we observed laminar organisation indicated by clear spatial angular differences in the ST directions in the sample (Fig. 3A). Quantifications of the FOD shape showed DA indices of 0.55 and 0.59 for the DESY and ESRF samples, respectively. In contrast, the mouse CC (splenium) did not visually reveal a similar angled laminar organisation (Fig. 7C), and the DA indices were lower, at 0.49 and 0.32, respectively. Two possible explanations exist. First, the within-pathway laminar organisation may not be identical across the entire CC. Consequently, more scans from other CC regions would be required to confirm. Second, the different species might account for the differences. Larger brains like the monkey might foster a different level of within-pathway axon organisation compared to the smaller mouse. Although we could not visually detect laminar organisation from the colour coding of the ST direction in the mouse, the non-zero DA values suggest some level of organisation. This is supported by our streamline tractography, which indicates a vertical layered organisation (Fig. 8A, B). It further aligns with studies using histological tracer mapping that shows a stacked parallel organisation of callosal projections in mice, between cortex regions M1 and S1 (Zhou et al. 2013). Nevertheless, we cannot rely solely on voxel-wise ST directions to fully describe axonal organisation, as this method does not contrast almost parallel fasciculi (inclination angles approaching 0 degrees). Analysing patterns in tractography streamlines would be an interesting future direction for this purpose.”

      The authors further quantify various other characteristics of the white matter, such as micro-dispersion, tortuosity, and maximum displacement. Notably, the microscopic FA calculated via structure tensor is fairly consistent across regions, though not modalities. When fibre orientations are combined across the sample, they are shown to produce similar FODs to dMRI acquired in the same tissue, which is reassuring. As noted in the text, the estimates of tortuosity and max displacement are dependent on the FOV over which they are calculated. Calculating these metrics over the same FOV, or making them otherwise invariant to FOV, could facilitate more meaningful comparisons across samples and/or modalities.

      R1.4: This raises an interesting point about the necessity of normalising the FOV to obtain invariant, tractography-based metrics of tortuosity and maximum deviation across different samples and modalities. In general, achieving this is challenging, and in this study, it is practically not possible. Between species, we encounter significant differences in brain volume ratios, which complicates the establishment of a common reference FOV due to the distinct anatomical organisation of monkey and mouse brains (see our response to R1.8). Within species, we would encounter challenges due to missing contrast—such as issues with staining—and the lack of perfect co-registration.

      The Discussion section (page 28) has been extended to reflect this: ”Within the same species, assuming perfect co-registration of samples, it would be possible to perform correlative imaging and analysis. This would allow validation of whether tractography streamlines could be reproduced at different image resolutions within the same normalised FOV. Although this was not possible with the current data and experimental setup, it would be an interesting point to pursue in future work.”

      Though the results seem solid, some statements, particularly in the abstract and discussion, do not seem to be fully supported by the data. For example, the abstract states "Our findings revealed common principles of fibre organisation in the two species; small axonal fasciculi and major bundles formed laminar structures with varying angles, according to the characteristics of major pathways.", though the results show "no strong indication within the mouse CC of the axonal laminar organisation observed in the monkey". Similarly, the introduction states: "By these means, we demonstrated a new organisational principle of white matter that persists across anatomical length scales and species, which governs the arrangement of axons and axonal fasciculi into sheet-like laminar structures." Further comments on the text are provided below.

      R1.5: We understand that it can be misunderstood that the laminar organisation is identical in monkeys and mice, which is not the case. For example, we show that in the corpus callosum, pathways are parallel in the mouse but not in the monkey. We have clarified that while the principle of layered laminar organisation of pathways is shared between monkeys and mice, species-specific differences do exist.

      We have made the following clarifying changes to the manuscript:

      The Abstract (page 2): “Our findings revealed common principles of fibre organisation that apply despite the varying patterns observed across species” 

      The Introduction (page 4-5): “Through these methods, we demonstrated organisational principles of white matter that persists across anatomical length scales and species. These principles govern the organisation of axonal fasciculi into sheet-like laminar shapes (structures with a predominant planar arrangement). Interestingly, while these principles remain consistent, they result in varied structural organisations in different species.” 

      The Discussion (page 21): “despite species differences”.

      One observation not notably discussed in the paper is that the spherical histograms of Figure 3E/H appear to have an anisotropic spread of the white points about 0,0. It would be interesting if the authors could comment on whether this could be interpreted as the FOD having asymmetric dispersion and if so, whether the axis of dispersion relates to the fibre orientations of the laminar structures.

      R1.6: That is a good point, and to address it, we have fitted spherical Bingham distributions to the FODs, allowing us to quantify their shapes. From each Bingham distribution, we derived two wellknown indices from the diffusion MRI community: the Orientation Dispersion Index (ODI) and Dispersion Anisotropy (DA) index. The ODI explains the dispersion of fibres for a single bundle FOD, whereas DA expresses the shape of the FOD on the unit sphere surface, i.e., the degree of anisotropy. We have integrated the Bingham-based analysis into the Methods, Discussion, and Results sections concerning Figures 3 and 7, but not Figure 4, which contains multiple fibre bundles that we cannot separate on a voxel level. The analysis does not impact the overall message and conclusion but adds interesting context to the discussion around laminar organisation.

      A limitation of the study is that it considers only small ex vivo tissue samples from two locations in a single postmortem monkey brain and slightly larger regions of mouse brain tissue. Consequently, further evidence from additional brain regions and subjects would be required to support more generalised statements about white matter organisation across the brain.

      R1.7: Collecting more samples from various locations in the brain would provide valuable insights into the consistency of white matter organisation across anatomical length scales, as well as the structuretensor based anisotropy and tortuosity metrics. However, being awarded beamtime at two different synchrotron facilities to scan the same sample with different imaging setups is practically challenging. At the ESRF, we have gathered additional image volumes from other white matter regions of the monkey brain that support all our findings, which will be published separately. X-ray synchrotron imaging technology is advancing rapidly, with faster acquisition times enabling more image volumes to be stitched together. This extends the FOV and allows for a more robust statistical description of the anatomy. Consequently, future studies with an extended FOV and varying image resolutions could utilise a single synchrotron facility to collect additional samples, further supporting our findings.

      The Discussion section (page 27) has been extended to reflect this: “Increasing the number of samples across both species and examining laminar organisation at various length scales in more regions would strengthen our findings. However, securing beamtime at two different synchrotron facilities to scan the same sample with varying image resolutions is a limiting factor. Beamline development for multiresolution experimental setups, along with faster acquisition methods, is a rapidly advancing field. For instance, the Hierarchical Phase-Contrast Tomography (HiP-CT) imaging beamline at ID-18 at the ESRF, enables multi-resolution imaging within a single session to address this challenge, though it is currently limited to a resolution of 2.5 μm (Walsh et al. 2021).”

      Given the monkey results, the mouse study (section 2.5 onwards) lacks some motivation. In particular, it is unclear why a demyelination model was studied and if/how this would link to the laminar structure observed in the monkey data. Further, it is unclear how comparable tortuosity/max deviation values are across species, considering the differences in data quality and relative resolution, given that the presented results show these values are very modality-dependent.

      R1.8: We have clarified the motivation for including the mouse part of the study in both the Introduction and the Results sections.

      The Introduction section (page 5): “Furthermore, using a mouse model of focal demyelination induced by cuprizone (CPZ) treatment, we investigate the inflammation-related influence on axonal organisation. This is achieved through the same structure tensor-derived micro-anisotropy and tractography streamline metrics.”

      The Results section (page 15): “Finally, we investigated the organisation of fasciculi in both healthy mouse brains and a murine model of focal demyelination induced by five weeks of cuprizone (CPZ) treatment. This allowed for the exploration of the disease-related influence on axonal organisation, particularly under inflammation-like conditions with high glial cell density at the demyelination site (He et al. 2021). The experimental setup for DESY and ESRF is similar to that described for the monkey, with the exception that we did not perform dMRI and synchrotron imaging on the same brains, and only collected MRI data for healthy mouse brains. This approach allowed us to apply the same structure tensor and tractography streamline analysis used previously, but in a healthy versus disease comparison, demonstrating the methodology’s ability to provide insights into pathological conditions.”

      Across species, the comparison of tortuosity and maximum deviation must be approached with caution. On one hand, we observe a comparable influence of the extra-axonal environment in both the monkey and mice, as discussed in the section “Sources to the non-straight trajectories of axon fasciculi.” On the other hand, the anatomical scale and relative image resolution are significant factors, as correctly pointed out. In the mouse, for instance, the measures are influenced by white matter pathway macroscopic effects, making cross-species comparison challenging to perform in a normalised way.

      The limitations section of the Discussion (page 28) has been updated to reflect this: ”A limiting consequence of having samples imaged at differing anatomical scales is that certain measures become inherently hard to compare in a normalised way. The tractography-based metrics—tortuosity and maximum deviation—serve as good examples of this resolution and FOV dependence. In the ESRF samples, the anatomical scale was at the level of individual axons, and the streamline metrics primarily reflect micro-scale effects from the extra-axonal environment, such as the influence of cells and blood vessels. In comparison, the larger anatomical scale in the DESY samples represents the level of fasciculi and above, with metrics influenced by macroscopic effects, such as the bending of the CC pathway. Both scales are interesting and can provide valuable insights in their own right, but caution is required when comparing the numbers, especially for cross-species studies where there is a significant difference in brain volume ratios.”

      The paper introduces a new method of "scale-space" parameters for structure tensors. Since, to my understanding, this is the first description of the method, some simple validation of the method would be welcomed. Further, the same scale parameters are not used across monkeys and mice, with a larger kernel used in mice (Table 2) which is surprising given their smaller brain size. Some explanation would be helpful.

      R1.9: We have expanded the description of the scale-space structure tensor approach in the Methods section. Specifically, we have elaborated on the empirical process used to select the scale-space parameters shown in Table 2 and explained why multiple scales were applied only to the monkey samples scanned at ESRF (see Table 2, sample IDs 2 and 3) but not to the other datasets. Additionally, we have added a supplementary figure to assist in illustrating the concept.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors combine diffusion MRI and high-resolution x-ray synchrotron phase-contrast imaging in monkey and mouse brains to investigate the 3D organization of brain white matter across different scales and species. The work is at the forefront of the anatomical investigation of the human connectome and aligns with several current efforts to bridge the resolution gap between what we can see in vivo at the millimeter scale and the complexity of the human brain at the sub-micron scale. The authors compare the 3D white matter organization across modalities within 2 small regions in one monkey brain (body of the corpus callosum, centrum semiovale) and within one region (splenium of the corpus callosum) in healthy mice and in one murine model of focal demyelination. The study compares measures of tissue anisotropy and fiber orientations across modalities, performs a qualitative comparison of fasciculi trajectories across brain regions and tissue conditions using streamlined tractography based on the structure tensor, and attempts to quantify the shape of fasciculi trajectories by measuring the tortuosity index and the maximum deviation for each reconstructed streamline. Results show measures of anisotropy and fiber orientations largely agree across modalities, especially for larger FOV data. The high-resolution data allows us to explore the fiber trajectories in relation to tissue complexity and pathology. The authors claim the study reveals new common organization principles of white matter fibers across species and scales, for which axonal fasciculi arrange into sheet-like laminar structures.

      Strengths:

      The aim of the study is of central importance within present efforts to bridge the gap between macroscopic structures observable in vivo in humans using conventional diffusion MRI and the microscopic organization of white matter tissue. Results obtained from this type of study are important to interpret data obtained in vivo, inform the development of novel methodologies, and expand our knowledge of the structural and thus functional organization of brain circuits.

      Multi-scale data acquired across modalities within the same sample constitute extremely valuable data that is often hard to acquire and represent a precious resource for validation of both diffusion MRI tractography and microstructure methods.

      The inclusion of multi-species data adds value to the study, allowing the exploration of common organization principles across species.

      The addition of data from a murine cuprizone model of focal demyelination adds interesting opportunities to study the underlying biological changes that follow demyelination and how these impact tissue anisotropy and fiber trajectories. These data can inform the interpretation and development of diffusion MRI microstructure models.

      Weaknesses:

      The main claim of a newly discovered laminar organization principle that is consistent across scales and species is not supported strongly enough by the data. The main evidence in support of the claim comes from the larger FOV data obtained from the body of the corpus callosum in the monkey brain. A laminar organization principle is partially shown in the centrum semiovale in the monkey brain and it is not shown in mice data. Additionally, the methods lack details to help the correct interpretation of these findings (e.g., how were these fasciculi defined?; how well do they represent different axonal populations?; what is the effect of blood vessels on the structure tensor reconstruction?; how was laminar separation quantified?) and the discussion does not provide a biological background for this organization. The corpus callosum sample suggests axons within a bundle of fibers are organized in a sheet-like fashion, while data from the centrum semiovale suggest fibers belonging to different fiber bundles are organized in a sheet-like arrangement. While I acknowledge the challenges in acquiring such high-resolution data, additional samples from different regions in the same animals and from different animals would help strengthen this claim.

      R2.1 

      -  how were these fasciculi defined?

      In the introduction (page 3), we have clarified our definition of an axon fasciculus: “A fasciculus is a bundle of axons that travel together over short or long distances. Its size and shape can vary depending on its internal organisation and its relationship to neighbouring fasciculi.”

      Additionally, we emphasise in the Results section (page 12) that the centroid streamlines are not guaranteed to be actual fasciculi, but rather representations of them. The paragraph now states: “To ease visualisation and quantification, we used QuickBundle clustering(Garyfallidis et al. 2012) to group neighbouring streamlines with similar trajectories into a centroid streamline. This centroid streamline serves as an approximation of the actual trajectory of a fasciculus.”

      - what is the effect of blood vessels on the structure tensor reconstruction?

      Fair point, that was not clear from our description. The clarification contains two parts. First, the estimation of the structure tensor occurs in all voxels, and in that sense, the blood vessels respond very similarly to axons. Second, when it comes to sample statistics derived from the structure tensor analysis (FA histograms and the FODs), they will have an influence, albeit a small one, given the low volume percentage of the blood vessels within the FOVs. In the monkey samples, segmenting the blood vessels was achievable with little effort, allowing us to exclude their contribution from FA statistics and FODs. To make this clear, we have added a paragraph to the Methods section (page 34) titled “Structure tensor-based quantifications,” reflecting this clarification. Additionally, we have restructured the entire structure tensor methods description (starting on page 32) as part of the reviewer comments in R1.6 and R1.9.

      - how was laminar separation quantified?

      We have added a clarification in Results section (page 7): “The laminar thickness was determined by manual measurements on laminae visually identified in the 3D volume”.

      - discussion does not provide a biological background for this organization.

      A good point. Including the biological background is relevant as it supports the laminar organisation of white matter pathways observed in our findings and those of others.

      We have added a section on this background in the Discussion (page 24): “We believe our observed topological rule of white matter laminar organisation can be explained by a biological principle known from studies of nervous tissue development. The first axons to reach their destination, guided by their growth cones, are known as “pioneering” axons. “Follower” axons use the shaft of the pioneering axon for guidance to efficiently reach the target region (Breau and Trembleau 2023). Axons can form a fasciculus by fasciculating or defasciculating along their trajectory through a zippering or unzipping mechanism, controlled by chemical, mechanical, and geometrical parameters. Zippering “glues” the axons together, while unzipping allows them to defasciculate at a low angle (Šmít et al. 2017). Although speculative, the zippering mechanism may be responsible for forming the laminar topology observed across length scales. The defasciculation effect can explain our results in the corpus callosum (CC) of monkeys, with laminar structures at low angles (~35 degrees) also observed by (Innocenti et al. 2019; Caminiti et al. 2009), as well as in other major pathways (Sarubbo et al. 2019). In contrast, a fasciculation mechanism may be observed in the mouse CC (0 degrees). If the geometrical angle between two axons is high, i.e., toward 90 degrees, the zippering mechanism will not occur, and the two axons (fasciculi) will cross (Šmít et al. 2017). This supports our and other findings that crossing fasciculi or pathways occur at high angles toward 90 degrees in the fully matured brain (Wedeen et al. 2012). Once myelination begins, the zippering mechanism is lost (Šmít et al. 2017), suggesting that laminar topology is established at the earliest stages of brain maturation.”

      - additional samples from different regions in the same animals and from different animals would help strengthen this claim

      Reviewer #1 also pointed to the inclusion of additional samples, and this is now discussed as part of the study limitations on page 27 (see also R1.7).

      The main goal of the study is to bridge the organization of white matter across anatomical length scales and species. However, given the substantial difference in FOVs between the two imaging modalities used, and the absence of intermediate-resolution data, it remains difficult to effectively understand how these results can be used to inform conventional diffusion MRI. In this sense, the introduction does not do a good enough job of building a strong motivation for the scientific questions the authors are trying to answer with these experiments and for the specific methodology used.

      R2.2: Indeed, this is an essential point now emphasised in the introduction, page 3, which now states: ”Despite the limited resolution of dMRI, the water diffusion process can reveal microstructural geometrical features, such as axons and cell bodies, though these features are compounded at the voxel level. Consequently, estimating microstructural characteristics depends on biophysical modelling assumptions, which can often be simplistic due to limited knowledge of the 3D morphology of cells and axons and their intermediate-level topological organisation within a voxel. Thus, complementary highresolution imaging techniques that directly capture axon morphology and fasciculi organisation in 3D across different length scales within an MRI voxel are essential for understanding anatomy and improving the accuracy of dMRI-based models(Alexander et al. 2019).”

      Additionally, in the introduction, page 4, we have made the following changes to strengthen the link across modalities, such that it now states: “In the x-ray synchrotron data, we applied a scale-space structure tensor analysis, which allowed for the quantification of structure tensor-derived tissue anisotropy and FOD in the same anatomical regime indirectly detected by dMRI.”

      The cuprizone data represent a unique opportunity to explore the effect of demyelination on white matter tissue. However, this specific part of the study is not well motivated in the introduction and seems to represent a missed opportunity for further exploration of the qualitative and quantitative relationship between diffusion MRI and sub-micron tissue information (although unfortunately not within the same brain sample). This is especially true considering the diffusion MRI protocol for mice would allow extrapolation of advanced measures from different tissue compartments.

      R2.3: A similar point was raised by Reviewer 1 (R1.8), and we have clarified the motivation for including the healthy mice and the demyelination samples.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Many thanks to the authors for providing open data. This was very helpful when reviewing the manuscript and is a valuable resource for the community.

      R1.10: We are happy to share our data with the community. Understanding anatomy in 3D is hard to achieve through still images and animations, so the ability to explore it on your own is quite important. The link to the data repository has been added in the Methods section in the following paragraph: “Due to the size of the data selected, processed image volumes, masks and results are available at https://zenodo.org/records/10458911. Other datasets can be shared on request.“

      One confusing element of the paper is that orientations (or axes) do not seem to be consistent across samples/modalities. For example, the green tensors in Figures 3 C and D are tilted up/down in opposite directions and the streamlines in Figure 5A seem opposite (SL) from what we would expect from Figure 2A (SR). Having consistent orientations across modalities and images would help the reader. When colouring tensors (e.g. in Figure 3), the authors could consider a 3D colour scheme (similar to that used by diffusion MRI) rather than colouring by only inclination, as this would provide useful information on whether different laminae have similar orientations, as implied by the tractography in Figure 4.

      R1.11: Thank you for spotting the suboptimal consistency between Figures 2, 3, and 5. Figure 2 has been corrected and updated. The left-right direction in the coronal views was not correctly displayed. Additionally, the glyph directions have been updated in Figures 2 and 3.

      By default, we use the “standard” RGB colour scheme used in dMRI. However, for the monkey CC— essentially Figure 3—this did not effectively illustrate our findings. We decided to use a different directional colour encoding scheme, which captures the angular deviation from the L-R axis. This was to assist in the visualisation of the inclination angle between the laminars. We have used the same colour scheme for the tensors in Figure 3 to avoid confusion.

      On a general note, the standard colour scheme has uniform “colour contrast” in all directions, but when there is only a single dominant direction in the sample, it can make sense to concentrate the colour contrast in that axis.

      Results: "even higher FA anisotropy in the micro-tensor domain of 0.997, i.e., the micro (μ)FA (20, 21)." I understand these references lead to a definition of μFA that is based on multiple diffusion tensor encodings which is quite different from that suggested by Kaden. It may be preferable to reference Kaden directly (since I understand this is the method used) to avoid confusion.

      R1.12: Correctly spotted, and we now reference the method from Kaden et al. and use the other references elsewhere when relevant.

      "and scanned the mouse brain in a whole." - typo?

      R1.13: Thank you for spotting the typo. The mouse brain was kept in the skull during MRI scanning, which has been clarified in the Methods section.

      The crossing fibre region appears to be sometimes referred to as the centrum semiovale, and other times as the CST. CS seems the better description and keeping this naming consistent would avoid confusion to the reader.

      R1.14: Well spotted, thank you. We have replaced the usage of Corticospinal Tract (CST) with centrum semiovale (CS) where relevant.

      Direct comments on the text:

      Abstract: "Individual axon fasciculi exhibited tortuous paths .... in a manner independent of fibre complexity and demyelination"

      Do statistical comparisons of the various distributions support this? The data shows somewhat increased tortuosity in the CST compared to the CC, and somewhat lower tortuosity in CPZ tissue.

      R1.15: The intention of the text was not to point to the comparison of tortuosity, but rather to highlight the maximum deviation. We observe a high probability density of maximum deviations at approximately 5-10 microns in all samples, which corresponds to the size of structures in the extraaxonal environment, such as blood vessels and cells.

      Additionally, we understand that the original statement might imply an expectation of a statistical analysis demonstrating independence, which is not the case. To clarify, we have reformulated the sentence in the Abstract (page 2) to address these points: “Fasciculi exhibited non-straight paths around obstacles like blood vessels, comparable across the samples of varying fibre complexity and demyelination.”

      Abstract: "A quantitative analysis of tissue anisotropies and fibre orientation distributions gave consistent results for different anatomical length scales and modalities, while being dependent on the field-of-view."

      To my understanding, the FODs here from different modalities are calculated over different FOVs (in monkeys at least), and FODs are only presented for a single FOV for each modality, meaning it is difficult to separate the effects of modality from FOV. The microscopic anisotropy is also noticeably different across modalities (DESY < ESRF < dMRI).

      R1.16: That is a fair point. Our statement was trying to capture too much condensed content to be correctly interpretable. We have reformulated the sentence to state: “Quantifications of fibre orientation distributions were consistent across anatomical length scales and modalities, whereas tissue anisotropy had a more complex relationship, both dependent on the field-of-view”.

      While it is true that we only present the ST-derived quantifications – FOD and FA statistics – for a single FOV per modality and sample, the results shown for the ESRF monkey samples (Figures 3 and 4) are a merge of four individually processed volumes. The quantifications of each individual subFOV have now been added as a supplementary figure (Figure S3) to highlight the consistency of the methodology and the effect of shifting the FOV position. In the case of the mouse, we have two volumes from different mice, which also display similar FOD and FA statistics.

      Abstract: "Our study emphasises the need to balance field-of-view and voxel size when characterising white matter features across anatomical length scales."

      This point does not seem very well explored in the paper, rather it is an observation of the limitations of the different imaging modalities. For example, there aren't analyses to compare metrics from highresolution data at different FOVs (i.e. by taking neighbourhoods of different sizes), nor are metrics compared from data at different resolutions and the same FOV.

      R1.17: The question is related to R1.16, R1.4, and R1.8, and we have addressed this point in our responses to those comments.

      Figure 7 - Taking into account the eigenvalues can be helpful when interpreting the secondary and tertiary eigenvectors of tensors (V2 and V3). It would be interesting to know whether the eigenvalues L2 ~= L3 are approximately equal (suggesting isotropic diffusion about V1, where the definition of V2 versus V3 isn't very meaningful), or if L2 is noticeably larger than L3 (suggesting anisotropic diffusion about V1, potentially similar to the anisotropic dispersion discussed above).

      R1.18: It would be interesting to explore the eigenvalues of the structure tensor in more detail, as has been done for the diffusion tensor. However, we believe this belongs to future work, as such additional detailed methodological analysis would complicate the already complex story. As mentioned in response to R1.10, most processed data has been made publicly available, and the rest can be requested (due to the storage size of the data sets) to perform such additional analysis.

      Discussion: "Importantly, our findings revealed common principles of fibre organisation in both monkeys and mice; small axonal fasciculi and major bundles formed sheet-like laminar structures," See above regarding the lack of evidence for laminar structures in mouse data.

      R1.19: We have reformulated the text for clarification as part of R1.3. Additionally, we added FOD quantifications to support why we do not observe an apparent laminar organisation in the mouse CC— please see our response to R1.6.

      Discussion: "Interestingly, the dispersion magnitude is indicative of fasciculi that skirt around obstacles in the white matter such as cells and blood vessels, and the results are largely independent of both white matter complexity (straight vs crossing fibre region) and pathology." Again, do statistical tests of the various distributions support this?

      R1.20: As part of R1.1, we have added statistical tests of significance for the quantifications of how max deviation changes when bending around objects. Indeed, the distributions are not statistically the same, and we do not wish to convey that sentiment, but they are comparable in the object sizes that they detect. As done in the abstract, we have reformulated the sentence to avoid misunderstanding and have replaced “largely independent” with “observed across.”

      Discussion: "Tax et al. have demonstrated the calculation of a sheet probability index from diffusion MRI data, which suggested the presence of sheet-like features in the CC"

      My understanding was that this was observed in crossing fibre regions, such as where fibres projecting with the CC cross the CST, but not the main body of the CC itself. Tax defines sheet structure as "composed of two tracts that cross each other on the same surface in certain regions along their trajectories." Is this a different phenomenon to the laminar structures observed here (where we observe fibres within a single tract being locally organised into laminar structures)?

      R1.21: Thank you for pointing our attention to this. We have corrected the section in the Discussion (page 23), so it now states: “Additionally, Tax et al. have demonstrated the calculation of a gridcrossing sheet probability index from diffusion MRI data, which suggested the presence of sheet-like features in a crossing fibre region (Tax et al. 2016), which is in line with our findings in the synchrotron data. Note that the method by Tax et al. only detects sheet-like structures crossing on a grid and does not reveal laminar structures with lower inclination angles, as we observed in the monkey CC.”

      Discussion: "We found that FODs were consistent across image resolutions and modalities, but only given that the FOV is the same." See above.

      R1.22: As part of our response to R1.6, we quantified the FODs using the ODI and DA indices, which should help support our statement. Nevertheless, we have toned down the statement and reformulated the text as follows: “We found that FODs were comparable across image resolutions and modalities. The observed discrepancies can be attributed to the fact that the FOVs are not exactly matched.”

      Discussion: "microscopic FA were highly correlated across modalities."

      The data shows FA is considerably lower in DESY to ESRF; within modality FA is quite consistent irrespective of tissue region; and differences between the CC and CG shown in ESRF data in mice are not repeated in DESY. It is unclear from the current data if this would lead to a high correlation across modalities. Some evidence would be helpful.

      R1.23: This is a fair point; we have not performed a correlation analysis. However, the pattern we observe for the synchrotron samples is as follows: When the anatomical length scale increases (becomes more macroscopic), the FA distribution shifts to lower values. This reflects the scale of information captured with the ST analysis (see also R1.9). Therefore, the most interesting comparison of FA statistics occurs when the resolution and anatomical length scale are approximately the same.  The sentence in question has been reformulated to the following: ”Estimates of structure tensor derived microscopic FA show a clear pattern across modalities.”

      Discussion: "If so, the (inclination angle) information might serve to form rules for low-resolution diffusion MRI based tractography about how best to project through bottleneck regions, which is currently a source of false-positives trajectories (6)."

      This is an interesting idea but it is unclear to me how this inclination information would help track through bottlenecks where, by definition, fibres are passing through with the same orientation. Some further explanation would be helpful.

      R1.24: We have elaborated on the section in the Discussion (page 23), explaining how this can be used to improve tractography tracing through complex regions: “The reason is that standard tractography methods do not "remember" or follow anatomical organisation rules as they trace through complex regions. Our findings on pathway lamination and inclination angles—low for parallel-like trajectories and high for crossing-like trajectories—can help incorporate trajectory memory into these methods, reducing the risk of false trajectories”.

      Reviewer #2 (Recommendations For The Authors):

      Below I report comments that if addressed I believe would improve the clarity and readability of the manuscript.

      -  Figures 1 and 2 would be more meaningful if combined into one figure. This would allow for a direct visual comparison of the two modalities. If space is needed, I believe the second row of Figure 1 (coronal views of CC) does not add much information. It is often hard to navigate the different orientations of the tissue in the images; thus any effort in trying to help the reader visually clarify would improve readability.

      R2.4: We considered the reviewer’s suggestion to merge Figures 2 and 3. However, this made both the figures and the main text additionally complex, so we chose to retain the original figure layout. Secondly, Figure 3 utilises a non-standard directional colormap. Keeping the colormap consistent within each figure is a feature we wish to preserve. In response to R1.11, the figures have been updated to have more consistent orientations for the monkey samples.

      In Figure 2, the second row, showing a coronal view of the CC, is essential for comparison with human data in Figure S1. It highlights where we observed the columnar laminar organisation and their inclination angle, as also detected by DTI.

      -  Figure 4 shows synchrotron data revealing an anterior-posterior component within the centrum semiovale that is not necessarily seen in the dMRI data. Could the authors comment on this?

      R2.5: Thank you for pointing this out. We have now addressed this in the Results section (page 10), where we describe the observation in detail: “Interestingly, visual inspection of the colour-coded structure tensor directions in Fig. 4E shows the existence of voxels whose primary direction is along the A-P axis. However, this represents a small enough portion of the volume that it does not appear as a distinct peak on the FOD.“

      -  The authors claim they observed several purple axons crossing orthogonally in Figure 5c. However, that is not necessarily clear in the figure.

      R2.6: We appreciate the feedback. We have now coloured the streamlines of the crossing fasciculi in Figure 5C in red.

      -  Figure 5 would benefit from adding the color encoding scheme for Figure 5d, as sometimes this is not necessarily consistent.

      R2.7: We appreciate the feedback. We have added an indication of the standard directional colour coding to Figure 5D.

      -  Figure 5d shows interesting data from the complex region. However, it is hard to visualize and it looks like there are not many streamlines traveling entirely I-S? Maybe a different orientation of the sample would help visualization.

      R2.8: A similar point was raised by Reviewer 1 (see R1.2). We have added an animation of the scene to assist in the interpretation of the 3D organisation within this complex sample.

      -  The concept of axon fasciculi is not necessarily immediately clear. Adding an explanation for what the authors refer to when using this term would improve clarity.

      R2.9: In the introduction, we now state our conceptual definition of an axon fasciculus as a number of axons that follow each other (see also R2.1).

      -  The methods do not provide details on how structure tensor FA is measured.

      R2.10: Thank you for pointing this out. We have restructured and expanded the structure tensor description in the Methods section (see also R1.9 and R2.1), which now includes the definition of FA.

      -  Why didn't the authors select the same cc region for both mice and monkeys? It seems this would have increased the strength of the comparison.

      R2.11: We agree. The reason lies in the chronology of experiments and the fact that we cannot control where demyelination takes place. We have added a clarifying description in the Methods section (page 31): “Note that several separate beamline experiments were conducted to collect the volumes listed in Table 1. In the first two experiments, samples from the monkey brain were scanned at ESRF and DESY, respectively. The samples from the mouse brain were imaged in two subsequent experiments. Consequently, the location of the identified demyelinating lesion in the cuprizone mice, which cannot be precisely controlled, did not match the location of the CC biopsies in the monkey.”

      -  While it is mentioned in the results, the methods do not explain how vessel segmentations or cell segmentation in mice was performed and for which datasets it was performed.

      R2.12: For the small ROI shown in Figure 6, the labelling was a manual process using the software ITK-SNAP, which has now been clarified in the corresponding figure caption. The generation of ROI masks and blood vessel segmentations involved a combination of intensity thresholding, morphological operations, and manual labelling in ITK-SNAP. This has been clarified in the restructured and expanded description of structure tensor analysis in the Methods section (starting on page 32).

      -  From the methods it is hard to understand (1) how many mice were used; (2) why dMRI was done on a different sample; (3) whether the same selenium region was selected for both healthy and CPZ animals; (4) how the registration across samples was performed.

      R2.13: We appreciate the feedback and have inserted clarifying statements in the relevant parts of the Methods section. (1) The total number of mice included was three: one normal, one cuprizone, and one normal for MRI scanning. (2) The quality of the collected dMRI on the mouse was too poor to use, and it could not be redone as the brain had already been sliced and prepared for synchrotron experiments. (3) The same splenium section was selected for both healthy and cuprizone mice. (4) A paragraph on image registration has been added.

      -  Diffusion MRI method sections would benefit from additional details on the protocols used.

      R2.14: Thank you for pointing this out. We have added more details about the diffusion MRI protocols, including the b-value, gradient strength, and other relevant parameters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less than compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. 

      Response 1.1. Thank you for your comments, which we address in detail below and in our response to Recommendations to the authors (see pp. 15-19 of this letter). We would first like to clarify the motivation behind our use of a stroke population to understand the interactions between the control of reaching in and holding. We agree that this idea can be laid out in a more compelling way.

      The fact that stroke patients usually display issues with their control of both reaching and holding, allows for within-individual comparisons of those two modes of control. Further, the magnitude of abnormalities is relatively large, making it easier to measure, compare and investigate effects. And, importantly, these two modes of control can be differentially affected after stroke (also pointed out by Reviewer 2, point 4 in Comments to the Authors). Finally, this kind of work – examining interactions between positive signs of stroke (such as abnormal posture or synergy) vs. negative signs (such as loss of motor control) – needs to be done in humans, as positive signs are relatively absent even in primates (Tower, 1940).

      We have changed our abstract (changes shown below in red), and our intro (expanding the second paragraph, lines 75-76), to lay out our motivation more clearly.

      From the abstract:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke. “

      On the other hand, the relation between force bias and the well-recognized flexor synergy seems rather self-evident, and I don't see that these results add much to that story.

      Response 1.2. While it seems natural that these biases would be the resting expression of abnormal flexor synergies (given their directionality towards the body, as shown in Figures 2-3, and the other similarities we demonstrate in Figure 8), we do not believe it is self-evident. These biases are measured at rest, with the patient passively moved and held still, whereas abnormal synergies emerge when the patient actively tries to move. The lack of relationship we find between these resting force biases and active movement underlines that the relation between force bias and flexor synergy should not be taken as self-evident, making it worthwhile to examine it (as we motivate in lines 589-596 and show in Figure 8).

      The paradox here is that, in spite of a relationship between force bias and flexor synergy (itself manifesting during attempted movement), there seems to be no relationship between force bias and direct measures of active movement (Figures 5,6). This is the paradox that inspired our conceptual model (Figure 9) and inspires to further investigate the factors under which these two systems are intermingled or kept separate. We thus find it to be a helpful element in the story.

      I am also struck by what seems to be a contradiction between the conclusions of the current and former studies: "These findings in stroke suggest that moving and holding still are functionally separable modes of control" and "the commands that hold the arm and finger at a target location depend on the mathematical integration of the commands that moved the limb to that location." The former study is mentioned here only in passing, in a single phrase in the discussion, with no consideration of the relation between the two studies. This is odd and should be addressed. 

      Response 1.3. While these two sets of findings are not contradictory, we understand how they can appear as such without providing context. We now discuss the relationship between our present study and the previous one more directly (lines 66-70 and 663-669 of the revised manuscript).

      The previous study examined how the control of movement informs the control of holding after the movement was over; the current study examines whether abnormalities in holding measured at rest with the movement leading to the rest position being passive. There are thus two important distinctions:

      First, directionality of potential effects: here we examine the effect of (abnormalities in) holding control upon movement, but the 2020 study (Albert et al., 2020) examines the effects of movement upon holding control. Stroke patient data in the 2020 study showed that, under CST damage, while the reach controller is disrupted, the hold controller can continue to integrate the malformed reach commands faithfully. In line with this, we proposed a model where the postural controller system sits downstream of the moving controller (Figure 7G in the 2020 paper). We thus did not claim, in 2020, that integration of movement commands is the only way to do determine posture control, as we stated explicitly back then, e.g. (emphasis ours):

      “Equations (1) and (2) describe how the integration of move activity may relate to changes in hold commands, but does not specify the hold command at the target.”

      In short, finding no effect of holding abnormalities upon movement (present finding) does not mean there is no potential effect of movement upon holding (2020 finding). This is something we had alluded to in the Discussion but not clarified, which we do now (see edits at the end of our response to this point).

      Second, active vs. passive movement: here, we measure holding control at rest (Experiment 1). The 2020 study shows that endpoint forces reflect the integration of learned dynamics exerted during active movement that led to the endpoint position. However, in Experiment 1, there is no active reaching to integrate, as the robot passively moves the arm to the held position. Thus, resting postural forces measured in Experiment 1 could not reflect the integration of reach commands that led to each rest position.  

      Thus, the two sets of findings are not contradictory. Taking our current and 2020 findings together suggests that active holding control would comprise would reflect both the integration of movement control that led to assuming the held position, plus the force biases measured at rest.

      Hence our decision to describe these two systems as functionally separable: while these systems can interact, the effects of post-stroke malfunctions in each can be independent depending on the function and conditions at hand. This does not make this a limited finding: being able to dissociate post-stroke impairment based on each of these two modes of control may inform rehabilitation, and also importantly, understanding the conditions in which these two modes of control become separable can substantially advance our understanding of both how different stroke signs interact with each other and how motor control is assembled in the healthy motor system. Figure 9 illustrates our conceptual model behind this and may serve as a blueprint to further dissect these circuits in the future.

      We discuss these issues briefly in lines 663-669 in our Discussion section, reproduced below for convenience:

      “It should be noted, however, that having distinct neural circuits for reaching and holding does not rule out interactions between them. For example, we recently demonstrated how arm holding control reflects the integration of motor commands driving the preceding active movement that led to the hold position, in both healthy participants and patients with hemiparesis (Albert et al., 2020). However, in that paper, we did not claim that this integration is the only source of holding control. Indeed, in Experiment 1 of the current study, we used passive movement to bring the arm to each probed position, which means that the postural biases could not be the result of integration of motor commands.” 

      And, we have adjusted our Introduction to provide pertinent context regarding our 2020 work (first paragraph, lines 66-70 of the updated manuscript).

      A minor wording concern I had is that the term "holding still" is frequently hard to parse. A couple of examples: "These findings in stroke suggest that moving and holding still are functionally separable modes of control." This example is easily read, "moving and holding [continue to be] functionally separable". Another: "...active reaching and holding still in the same workspace, " could be "...active reaching and holding [are] still in the same workspace." Simply "holding", "posture" or "posture maintenance" would all be better options.

      Response 1.4. Thank you for your suggestion. Following your comment, we have abbreviated this term to simply “holding”, both on the title and throughout the text.

      Reviewer #2 (Public Review):

      Summary: 

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Strengths: 

      The strength of the paper is that they clearly demonstrate with the posture task (i.e., active holding against a load) that the resting postural forces influence subsequent control (i.e., the path to stabilize, time to stabilize, max. deviation) following a sudden perturbation (i.e., suddenly removal of the load). Further, they can explain their findings with a conceptual model, which is depicted in Figure 9. 

      Weaknesses: 

      Current weaknesses and potential concerns relate to i) not displaying or reporting the results of healthy controls and non-paretic arm in Experiment 2 and ii) large differences in force perturbation waveforms between movement (sudden onset) and posture (sudden release), which could potentially influence the results and or interpretation. 

      Response 2.0. Thank you for your assessment, and for pointing out ways to improve our paper. We address the weakness and potential concerns in detail below.

      Larger concerns

      (1) Additional analyses to further support the interpretation. In Experiment 1 the authors present the results for the paretic arm, non-paretic arm, and controls. However, in Experiment 2 for several key analyses, they only report summary statistics for the paretic arm (Figure 5D-I; Figure 6D-E; Figure 7F). It is understood that the controls have much smaller resting postural force biases, but they are still present (Figure 3B). It would strengthen the position of the paper to show that controls and the non-paretic arm are not influenced by resting postural force biases during movement and particularly during posture, while acknowledging the caveat that the resting positional forces are smaller in these groups. It is recommended that the authors report and display the results shown in Figure 5D-I; Figure 6D-E; Figure 7F for the controls and non-paretic arm. If these results are all null, the authors could alternatively place these results in an additional supplementary. 

      Response 2.1a. Thank you for your recommendations. We agree both on the value of these analyses and the caveat associated with them: these resting postural force biases are substantially smaller for the non-paretic and control data (for example, the magnitude of resting biases in the supported condition is 2.8±0.4N for the paretic data, but only 1.8±0.4N and 1.3±0.2N for the non-paretic and control data, respectively; the difference is even greater in the unsupported condition, though this is not the one being compared to Experiment 2).

      We now conduct a comprehensive series of supplementary analyses, including the examination of non-paretic and control data for all three components of Experiment 2 (unperturbed reaches; pulse perturbations; and active holding control). These are mentioned in the Results (lines 422-424, 512513, and 574-574 of the revised manuscript) and illustrated in the supplementary materials: Supplementary Figures S5-1, S6-1, and S7-1 contain the main analyses (comparisons of instances with the most extreme resting biases for each individual) for the unperturbed reach analysis, pulse perturbation analysis, and active holding control analysis, respectively.

      We find that non-paretic and control data do not display effects of resting biases upon unperturbed reaching control (Figure S5-1) or control against a pulse perturbation early during movement (Figure S6-1) – as is the case with the paretic data. Non-paretic and control data do not display evidence of influence of their resting force biases upon active holding control either (Figure S7-1), unlike the paretic data. For the non-paretic data, however, these influences are nominally towards the same direction as in the paretic data. Given that resting biases are substantially weaker for the non-paretic case, it is possible a similar relationship exists but requires increased statistical power to discern. Moreover, it is possible that the effect of resting biases is non-linear, with small biases effectively kept under check so that their impact upon active holding control is even less than a linearly scaled version of the impact of the stronger, paretic-side biases. This can be the subject of future work.

      Please also note that, following your recommendation (Recommendations to the Authors, point 2.1), we have conducted secondary analyses which estimate sensitivity to resting bias using all datapoints, validating our main analyses; these analyses were also performed for control and non-paretic data, with similar results (Response 2.A.1).

      Further, the results could be further boosted by reporting/displaying additional analyses. In Figure 6D the authors performed a correlation analysis. Can they also display the same analysis for initial deviation and endpoint deviation for the data shown in Figure 5D-F & 5G-I, as well for 7F for the path to stabilization, time to stabilization, and max deviation? This will also create consistency in the analyses performed for each dependent variable across the paper.

      Response 2.1b. Here, we set to test whether resting biases affect movement. It is best to do this using a within-individual comparison design, rather than using across-individual correlations: while correlation analyses can in general be informative, they obscure within-individual effects which are the main comparisons of interest in our study. Consider a participant with strong resting bias towards one direction, tested on opposing perturbations; averaging these responses for each individual would mostly cancel out any effects of resting biases. Even if we were to align responses to the direction of the perturbation before averaging, the power of correlation analyses may be diluted by inter-individual differences in other factors, such as overall stiffness.

      Thus, our analysis design was instead focused on examining the differential effects of resting posture biases within each individual’s data. We compared the most extreme opposing/aligned or clockwise/counter-clockwise instances within each individual, specifically to assess these differential effects. In our revised version, we have further reinforced these analyses to include all data rather than the most extreme instances (see response 2.A.1.a to the Reviewer’s recommendation to the authors) where we performed correlations of within-individual resting posture vs. the corresponding dependent variables and compared the resulting slopes. 

      The across-individual correlation analyses add little to that for the reasons we outlined above. At the same time, it is possible they can be helpful in e.g. illustrating across-individual variability. We thus now include across-individual correlation analyses for all dependent variables, but, given their limited value, only in the supplementary material. This also means that, for consistency, we moved the correlation analysis in Figure 6 to the corresponding supplementary figure as well (Figure S6-3).

      In addition, following the Reviewer’s comment about consistency in the analyses performed for each dependent variable across the paper, we added within-individual comparisons for settling time following the pulse perturbations (Figure 6D, right).

      (2) Inconsistency in perturbations that would differentially impact muscle and limb states during movement and posture. It is well known that differences in muscle state (activation / preloaded, muscle fiber length and velocity) and limb state (position and velocity) impact sensorimotor control (Pruszynski, J. A., & Scott, S. H. (2012). Experimental brain research, 218, 341-359.). Of course, it is appreciated that it is not possible to completely control all states when comparing movement and posture (i.e., muscle and limb velocity). However, using different perturbations differentially impacts muscle and limb states. Within this paper, the authors used very different force waveforms for movement perturbations (i.e., 12 N peak, bell-shaped, 0.7ms duration -> sudden force onset to push the limb; Figure 6A) and posture perturbations (i.e., 6N, 2s ramp up -> 3s hold -> sudden force release that resulted in limb movement; Figure 4) that would differentially impact muscle (and limb) states. Preloaded muscle (as in the posture perturbation) has a very different response compared to muscle that has little preload (as in the movement perturbations, where muscles that would resist a sudden lateral perturbation would likely be less activated since they are not contributing to the forward movement). Would the results hold if the same perturbation had been used for both posture and movement (e.g., 12 N pulse for both experiments)? It is recommended that the authors comment and discuss in the paper why they chose different perturbations and how that might impact the results. 

      Response 2.2a. We agree that it can be impossible to completely control all states when comparing movement and posture. We would also like to stress that these perturbations were not designed so that responses are directly compared to each other (though of course there is an indirect comparison in the sense that we show influence of biases in one type of perturbation but not the other). Instead, Experiment 2 tried to implement a probe optimized for each motor control modality (moving vs. holding). However, the Reviewer has a point that the potential impact of differences between the perturbations is important to discuss in the paper.

      The Reviewer points out two potentially interesting differences between the two perturbations. First, the magnitude (6N for the posture perturbation vs. 12N for the pulse perturbation); second, the presence of background load in the posture perturbation, in contrast to the pulse perturbation.

      For the movement perturbation, we used a 12-N, 70ms pulse. This perturbation and scaled versions have been tested before in both control and patient populations (Smith et al., 2000; Fine and Thoroughman, 2006). For the holding perturbation, we used a background load to ensure that active holding control is engaged, and the duration of the probe (holding for about 5s) made using a stronger perturbation impractical –maintaining a background load at, say, 12N for that long could lead to increased fatigue.

      The question raised by the Reviewer, whether the findings would be the same if the same, 12-N pulse were used to probe both moving and holding control, is interesting to investigate. We would expect the same qualitative findings (i.e. there would still be a connection between resting posture and active holding control when the latter were probed with a 12N pulse). Recent work provides more specific insight into what to expect. Our posture perturbation task is similar to the Unload Task in (Lowrey et al., 2019), whereby a background torque is released, whereas our pulse perturbation is more similar to their Load Task, whereby a torque is imposed against no background load (though it is a step perturbation rather than a pulse). Lowrey et al., 2019 find that their Unload task is harder than the Load task, with 2x the fraction of patient trials classified as failed (with failure defined as task performance being outside of the 95% confidence interval for controls), though there are still clear effects for the Load task. 

      This suggests that the potential effects of using a pulse-like perturbation to probe posture control would likely be weaker in magnitude, all other things being equal. At the same time, however, the Load and Unload tasks in Lowrey et al., 2019 were perturbations of the same magnitude; it is thus also likely that the reduction in effect would be mitigated, or reversed, by the fact that we would be using a 12N instead of a 6N perturbation.

      A relevant consequence of the Lowrey et al., 2019 findings is that the Unload paradigm is superior in its ability to detect impairment in static, posture perturbations, and thus provides a better signal to detect potential relationships with resting posture biases. This is not surprising, as a background load further engages the control of active holding, which what we were trying to probe in the first place.

      But then why not use the same paradigm (preloading and release) for movement? There are two main reasons. First, requiring a background load throughout the experiment is unfeasible due to fatigue. Second, for the holding perturbation, we wanted to ensure that the postural control system is meaningfully engaged when the perturbation hits, hence we picked the background load. Were we to impose the same during moving – i.e. impose a lateral background load on the movement - we could be engaging posture control on top of movement control. This preloading would reduce the degree to which the pulse probe isolates movement control, and lead to intrusion of the posture control system in the movement task by design. This relates to what the Reviewer proposes in the comment below: preloading may result in postural biases i.e. engage posture control; see below where we argue this interpretation is within the scope of our conceptual model rather a counter to it.

      We now explain the rationale behind our perturbation design in the Methods section (lines 211-220).

      Relatedly, an alternative interpretation of the results is that preloading muscle for stroke patients, whether by supporting the weight of one's arm (experiment 1) or statically resisting a load prior to force release (experiment 2), leads to a greater postural force bias that can subsequently influence control. It is recommended that the authors comment on this. 

      Response 2.2b. We find this interpretation valid, but we do not see how it meaningfully differs from the framework we propose. We already state that the RST may be tailored for both posture/holding control and the production of large forces (which would include muscle preloading):

      “Thus, the accumulated evidence suggests that the RST could control posture and large force production in the upper limb.“ (lines 698-699 in the current version)

      “the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces” (lines 724-726 in the current version)

      And, we discuss other conditions where the RST is involved in large force production, such as power grip, and how these interact with the role of the RST in posture/holding control (lines 758-768 in the current version).

      To better explain our model, we now provide the two examples mentioned by the reviewer along with our description of the proposed role for the RST (lines 726-727):

      “…the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces (such as vertical forces for arm support, or horizontal forces for holding the arm still against a background load like in our posture/release perturbation trials).”

      We note, however, that we find resting posture abnormalities even in the presence of arm support, suggesting the involvement of the RST in holding control even when the forces involved (and the need to preload the muscle) are small.

      Reviewer #3 (Public Review): 

      The authors attempt to dissociate differences in resting vs active vs perturbed movement biases in people with motor deficits resulting from stroke. The analysis of movement utilizes techniques that are similar to previous motor control in both humans and non-human primates, to assess impairments related to sensorimotor injuries. In this regard, the authors provide additional support to the extensive literature describing movement abnormalities in patients with hemiparesis both at rest and during active movement. The authors describe their intention to separate out the contribution of holding still at a position vs active movement as a demonstration that these two aspects of motor control are controlled by two separate control regimes.

      Strengths: 

      (1) The authors utilize a device that is the same or similar to devices previously used to investigate motor control of movement in normal and impaired conditions in humans and non-human primates. This allows comparisons to existing motor control studies. 

      (2) Experiment 1 demonstrates resting flexion biases both in supported and unsupported forelimb conditions. These biases show a correlated relationship with FM-UE scores, suggesting that the degree of motor impairment and the degree of resting bias are related.

      (3) The stroke patient participant population had a wide range of both levels of impairment and time since stroke, including both sub-acute and chronic cases allowing the results to be compared across impairment levels.

      The authors describe several results from their study: 1. Postural biases were systematically toward the body (flexion) and increased with distance from the body (when the arm was more extended) and were stronger when the arm was unsupported. 2. These postural biases were correlated with FM-UE score. 3. They found no evidence of postural biases impacting movement, even when that movement was perturbed. 4. When holding a position at the end of a movement, if the position was perturbed opposite of the direction of bias, movement back to the target was improved compared to the perturbation in the direction of bias. Taken together, the authors suggest that there are at least two separate motor controls for tasks at rest versus with motion. Further, the authors propose that these results indicate that there is an imbalance between cortical control of movement (through the corticospinal tracts) and postural control (through the reticulospinal tract).

      Response 3.1. Thank you for pointing out some of the strengths of our work and summarizing our findings. A minor clarification we would like to make, related to (3), is that, while our study did enroll two patients towards the end of the subacute stage (2-3 months), the rest of the population were at the chronic stage, at one year and beyond. We thus find it very unlikely that time after stroke was the primary driver of differences in impairment in the population we studied.

      There are several weaknesses related to the interpretation of the results:

      In Experiment 1, the participants are instructed to keep their limbs in a passive position after being moved. The authors show that, in the impaired limb, these resting biases are significantly higher when the limb is unsupported and increase when the arm is moved to a more extended position.

      When supported by the air sled, the arm is in a purely passive position, not requiring the same antigravity response so will have less RST but also less CST involvement. While the unsupported task invokes more involvement of the reticulospinal tract (RST), it likely also has significantly higher CST involvement due to the increased difficulty and novelty of the task.

      If there were an imbalance in CST regulating RST as proposed by the authors, the bias should be higher in the supported condition as there should be relatively less CST activation/involvement/ modulation leading to less moderating input onto the RST and introducing postural biases. In the unsupported condition, there is likely more CST involvement, potentially leading to an increased modulatory effect on RST. If the proportion of CST involvement significantly outweighs the RST activation in the unsupported task, then it isn't obvious that there is a clear differentiation of motor control. As the degree of resting force bias and FM-UE score are correlated, an argument could be made that they are both measuring the impairment of the CST unrelated to any RST output. If it is purely the balance of CST integrity compared to RST, then the degree of bias should have been the same in both conditions. In this idea of controller vs modulator, it is unclear when this switch occurs or how to weigh individual contributions of CST vs. extrapyramidal tracts. Further, it isn't clear why less modulation on the RST would lead only to abnormal flexion.

      Response 3.2. Our model posits two mechanisms by which CST impairment would lead to increased RST involvement. The first – which is the one discussed by the Reviewer here - is a direct one, whereby weaker modulation of the RST by the CST leads to increased RST involvement. The second is an indirect one, whereby the incapacity of CST to drive sufficient motor output to deal with tasks eventually leads to increased RST drive.

      The reviewer suggests it is likely that the unsupported task demands increased activation through both the CST and the RST. If that were the case, however, it would exaggerate the effects of CST/RST imbalance after stroke compared to healthy motor control: if task conditions (lack of support) required higher CST involvement, then CST damage would have an even larger effect. In turn, this would lead to even higher RST involvement and further diminishing the ability of CST to moderate RST. Thus, RST-driven biases would be higher in the unsupported condition.

      And, given that the CST itself is damaged and has to deal with an even-increased RST activation, we would not expect that the proportion of CST involvement would outweigh RST activation, but the opposite. In fact, a series of relatively recent findings suggest just this. For example,

      • Zaaimi et al., 2012  showed that unilateral CST lesions in monkeys lead to significant increases in the excitability of the contralesional RST (Zaaimi et al., 2012). Interestingly, this effect was present in flexors but not extensors, potentially explaining why less modulation and/or overactivation of the RST would primarily lead to abnormal flexion. 

      • McPherson et al. (further discussed in point 2.A.23, by Reviewer 2 – Recommendations to the Authors) showed that, after stroke, contralesional activity (which would include the ipsilateral RST) increases relative to ipsilesional activity (which would include the contralateral CST)

      (McPherson et al., 2018). The same study also provides evidence that FM-UE may primarily reflect RST-driven impairment. The ipsilateral(RST)/contralateral(CST) balance, expressed as a laterality index, correlated with FM-UE, with lower FM-UE for indices indicating higher RST involvement. (Interestingly, the slope of this relationship was steeper when the laterality of brain activation patterns was examined under tasks with less arm support, mirroring the steeper FM-UE vs resting bias slope when arm support is absent, as shown in our Figure 8).

      • Wilkins et al., 2020 (Wilkins et al., 2020) found that providing less support (i.e. requiring increased shoulder abduction) increases ipsilateral activation (representing RST) relative to contralateral activation (representing CST).

      This resting bias could be explained by an imbalance in the activation of flexors vs extensors which follows the results that this bias is larger as the arm is extended further, and/or in a disconnect in sensory integration that is overcome during active movement. Neither would necessitate separate motor control for holding vs active movement. 

      Response 3.3. We do not think that either of these points necessarily argue against our model. First, the resting biases we observe are clearly pointed towards increased flexion, and can thus be seen as the outcome of an imbalance in the activation of flexors vs. extensors at rest. This imbalance between flexors/extensors can also be explained by the CST/RST imbalance posited by our conceptual model: in their study of CST lesions in the monkey, Zaaimi et al., 2012 found increased RST activation for flexors but not extensors, suggesting that RST over-involvement may specifically lead to flexor abnormalities (Zaaimi et al., 2012). Second, overcoming a disconnect in sensory integration may be one way the motor system switches between separate controllers; how this switch happens is not examined by our conceptual model.

      In Experiment 2, the participants are actively moving to and holding at targets for all trials while being supported by the air sled. Even with the support, the paretic participants all showed start- and endpoint force biases around the movement despite not showing systematic deviations in force direction during active movement start or stop. There could be several factors that limit systematic deviations in force direction. The most obvious is that the measured biases are significantly higher when the limb is unsupported and by testing with a supported limb the authors are artificially limiting any effect of the bias.

      Response 3.4. We do expect, in line with what the reviewer suggests, that any potential effects would be stronger in the unsupported condition. The decision to test active motor control with arm support was done as running the same Experiment 2 would pose challenges, particularly with our most impaired patients, given the duration of Experiment 2 (~2 hours, about 1 hour with each arm) and the expected fatigue that would ensue.

      However, a key characteristic of our comparisons is that we are comparing Experiment 2 active control data under arm support, against Experiment 1 resting bias data also under arm support. While Experiment 1 measured biases without arm support as well, these are not used for this comparison. And, while resting biases are weaker with arm support, they are still clear and significant; yet they do not lead to detectable changes in active movement.

      At the same time, we do not rule out that, if we were to repeat Experiment 2 without arm support, we could find some systematic deviation in the direction of resting bias in movement control. Our conceptual model, in fact, suggests that this may be the case, as we described in lines 618-620 of our original manuscript. The idea here is that, when arm support is not provided, the increased strength requirements lead to increased drive through the RST, to the point that posture control (and its abnormalities) spills into movement control (Figure 9). We now better clarify this position in our Discussion (lines 744-750):

      “The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed (i.e. resting biases may influence active reaching in absence of weight support). Supporting this idea, a study found increased ipsilateral activity (which primarily represents activation via the descending ipsilateral RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).”

      It is also possible that significant adaptation or plasticity with the CST or rubrospinal tracts could give rise to motor output that already accounts for any intrinsic resting bias.  

      Response 3.5. This kind of adaptation – regardless of the tracts potentially involved – is an issue we examined in our experiment. As we talk about in our Results (lines 458-460 in the updated manuscript), with most of our patient population in the chronic stage, it could be likely that their motor system adapted to those biases to the point that movement planning took them into account, thereby limiting their effect. This motivated us to examine responses to unpredictable perturbations during movement (Figure 6) where we still find lack of an obvious effect of resting biases upon reaching control. We thus believe that our findings are not explained by this kind of adaptation, though we agree it would be of great interest for future work to compare resting biases and reaching control in acute vs. chronic stroke populations to examine the degree to which stroke patients adapt to these biases as they recover.

      In any case, the results from the reaching phase of Experiment 2 do not definitively show that directional biases are not present during active reaching, just that the authors were unable to detect them with their design. The authors do acknowledge the limitations in this design (a 2D constrained task) in explaining motor impairment in 3D unconstrained tasks. 

      Response 3.6. It is, of course, an inherent limitation of a negative finding is that it cannot be proven. What we show here is that, there is no hint of intrusion of resting posture abnormalities upon active movement in spite of these resting posture abnormalities being substantial and clearly demonstrated even under arm support. To allow for the maximum bandwidth to detect any such effects, we specifically chose to compare the most extreme instances (resting bias-wise) for each individual, and yet we did not find any relationship between biases and active reaching.

      This suggests that, even if these biases could be in some form present during active movement, their effect would be minimal and thus limited in meaningfully explaining post-stroke impairment in active movement under arm support.

      Note that, as we already discuss, our conceptual model (Figure 9) suggests that the degree to which directional biases would be present in active reaching may be influenced by arm support (or the specific movements examined – hence our limitation in not examining 3D movement). Thus we do not claim that this independence is absolute. Examples include the last line of the passage quoted right above, and the summary statement of our Discussion quoted below (lines 639-641):

      “…which raises the possibility that the observed dissociation of movement and posture control for planar weight-supported movements may break down for unsupported 3D arm movements.”

      Finally, we now more explicitly acknowledge that abnormal resting biases may influence active movement in the absence of arm support (see Response 3.4).

      It would have been useful, in Experiment 2, to use FM-UE scores (and time from injury) as a factor to determine the relationship between movement and rest biases. Using a GLMM would have allowed a similar comparison to Experiment 1 of how impairment level is related to static perturbation responses. While not a surrogate for imaging tractography data showing a degree of CST involvement in stroke, FM-UE may serve as an appropriate proxy so that this perturbation at hold responses may be put into context relative to impairment.

      Response 3.7. Here the Reviewer suggests we use FM-UE scores as a proxy for CST integrity. We do not think this analysis would be particularly helpful in our case for a number of reasons:

      First, while FM-UE is a general measure of post-stroke impairment, it was designed to track - among other things - the emergence and resolution of abnormal synergies, a sign assumed to result from abnormally high RST outflow (McPherson et al., 2018; McPherson and Dewald, 2022). In line with this, the FM-UE scales with EMG-based measures of synergy abnormality (Bourbonnais et al., 1989). Impairments in dexterity, a sign associated with damage to the CST (Lawrence and Kuypers, 1968; Porter and Lemon, 1995; Duque et al., 2003), dissociate with synergy abnormalities when compared under arm support as we do here (Levin, 1996; Hadjiosif et al., 2022). This means that FM-UE would be a stronger proxy for RST activity and thus not a direct proxy for CST integrity particularly when one wants to dissociate RST-specific vs. CST-specific abnormalities. In fact, as we discuss in Response 3.2 above, there is a number of studies supporting this idea: for example, Zaaimi et al., 2012 show that relative RST activation – the balance between ipsilateral excitability, primarily reflecting RST, and contralateral excitability, primarily reflecting the CST, scales with FM-UE (Zaaimi et al., 2012).

      Second, this kind of analysis would obscure within-individual effects, since FM-UE scores are, of course, assigned to each individual. This is the same issue as doing across-individual correlation analyses in general (see response 2.1b).Strong resting force bias would have opposite effects on opposing perturbations, averaging across subjects would occlude these effects.

      Third, while FM-UE is a good measure of synergy abnormality, weakness alone could also give an abnormal FM-UE (Avni et al., 2024).

      The Reviewer also suggests we use time from injury for this analysis. Time from injury can indeed potentially be an important factor. However, this analysis would not be appropriate for our dataset, since the effective variation in recovery stage within our population is limited: our sample is essentially chronic (only two patients were examined within the subacute stage – at 2 and 3 months after stroke - with everybody else examined more than a year after stroke) with the “positive” elements of their phenotype (and FM-UE itself) essentially plateaued (Twitchell, 1951; Cortes et al., 2017). We thus would not expect to see any meaningful effects of time from injury within our population. It would be an excellent question for future work to investigate both resting biases and their relationship to reaching in acute/subacute patients, and examine whether the trajectory of resting biases (both emergence and abatement due to recovery) follows the one for abnormal synergies.

      It is not clear that even in the static perturbation trials that the hold (and subsequent move from perturbation) is being driven by reticulospinal projections. Given a task where ~20% of the trials are going to be perturbed, there is likely a significant amount of anticipatory or preparatory signaling from the CST. How does this balance with any proposed contribution that the RST may have with increased grip?

      Response 3.8. We included our response to this as part of Response 3.2. In brief, while we cannot rule out that these tasks may recruit increased CST signaling, this would tend to increase, rather than reduce, the effects of post-stroke impairment: the requirement for increased signaling from a CST that is damaged would magnify the effects of this damage, in turn leading to increased recruitment of other tracts, such as the RST.

      In general, the weakness of the interpretation of the results with respect to the CST/RST framework is that it is necessary to ascribe relative contributions of different tracts to different phases of movement and hold using limited or indirect measures. Barring any quantification of this data during these tasks, different investigators are likely to assess these contributions in different ways and proportions limiting the framework's utility.

      Response 3.9. We believe that our Reponses 3.2-3.6 put our findings in fair perspective, and the edits undertaken based on the Reviewer’s comments have clarified our position as to how the dissociation between holding and moving control may break down. We do agree, however, that our framework would be strengthened by the use of direct measures of CST/RST connectivity in future research. We present our conceptual model as a comprehensive explanation of our findings and how they blend with current hypotheses regarding the role of these two tracts in motor control after stroke.  As such, it provides a blueprint towards future research that more directly measures or modulates CST and RST involvement, using tools such as tractography or non-invasive brain stimulation.

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      L226 “…of this issue, we repeated the analysis of Figure 7F (a) by excluding these four patients…”.  Should this be three, based on the previous sentence? 

      Response 1.A.1. Thank you for pointing this typo, which is now corrected. The analysis in question (Figure S1 in the original submission, now re-numbered as Figure S7-4), excluded the three patients mentioned in the previous sentence.

      L254 “…the hand was held in a more distal position. The postural force biases were strongest when…”  Could this be "extended" rather than distal? See my later comment about the inadequate description of targets.

      Response 1.A.2. The reviewer is correct that, the arm will tend to be more extended in the distal targets. However, since these positions were defined in extrinsic coordinates, we think the terms distal/proximal are also appropriate. In either case, we now clarify these definitions in the text (see Response 1.A.3 below).

      L263 “…contained both distal and proximal targets, and, importantly, they were also the movement…”.  Distal/proximal targets were never described as part of the task. 

      Response 1.A.3. We improved our description by (i) changing the wording above to “represented positions both distal and proximal to the body,”, (ii) doing the same in our Methods (line 175) and (iii) indicating distal/proximal targets in Figure 3A (bottom right of panel A).

      L378 “…the pulse perturbation. We hypothesized that, should resting postural forces play a role, they…”  L379 “…would tend to reduce the effect of the pulse if they were in the opposite direction, and…”  Not really obvious why. A reduction in the displacement caused by a force pulse might be caused by different stiffness or viscosity, but not by a linear, time-invariant force bias. This situation is different from that of "moving the arm through a high-postural bias area vs. a low-postural bias area" where it would encounter time- (actually spatially) varying forces and varying amounts of displacement. Clarify the logic if this is a critical point.

      Response 1.A.4. We thank the Reviewer for highlighting this point of potential confusion. We now clarify that these postural bias forces are neuromuscular in origin (Kanade-Mehta et al., 2023), and likely result from an expression of abnormal synergy, at least under static conditions. In this case, we hypothesized that force pulses acting against the gradient of the postural bias field would act to stretch the already active muscles, which would lead to a further increase in postural resistance due to inherent length-tension properties of active muscle. By contrast, force pulses acting along the gradient of the postural bias field would act to shorten the same active muscles, which would lead to a reduction in postural resistance. The data did not support this in the case of force pulses imposed during movement. We note, however, that similar effects would affect responses to static perturbations as well, wherein we do find an effect of resting biases. We now better explain this reasoning (lines 479482).

      L466 “resting postural force). In short, our perturbations revealed that resting flexor biases switched  467 on after movement was over, providing evidence for separate control between moving” and 

      L468 “holding still.”

      I do not think the authors have presented clear evidence that forces, "switch on", implying the switch to a different controller which they posit. This could as easily be a nonlinear or time-varying property of a single controller (admittedly, the latter possibility overlaps broadly with their idea of distinct, interacting controllers). An example that the authors are certainly aware of is that of muscle "thixotropy" a purely peripheral mechanism due to the dynamics of crossbridge cycling that causes resting muscle to be stiffer than moving muscle, changing with a time constant of ~1-2 seconds. Neither this particular example nor changing levels of contraction (more likely during the unpredictable force perturbations) would be in the direction to explain the main observation here -- a point perhaps worth making, together with the stretch reflex comments. 

      Response 1.A.5. Thank you for this perspective. Indeed, it might be that “switching on” represents a shift along a nonlinear property of the same controller: in the extreme, if this nonlinearity is a step (on/off) function, this single controller would be functionally identical to two separate controllers. We thus cannot tell if these controllers are distinct in the strict sense. What we argue here is that, no matter the underlying controller architecture - two distinct controllers or two distinct modes of the same controller - is that the control of reaching vs. holding can be functionally separable even after stroke. In line with this idea, we used a more nuanced phrasing (e.g. “separable functional modes for moving vs. holding”) throughout our manuscript, and we have now edited out a mention of “separate controllers” to be consistent with this.

      Moreover, thank you for pointing out the example of thixotropy, showing how peripheral mechanisms could interact with central control. As you point out, this effect would not explain the main observation here: in fact, if stiffness were substantially higher during rest or holding (instead of moving) that would reduce the impact of the static perturbation, making it harder to detect any effects of resting biases compared to the moving perturbation case.

      L480 “…during movement (Sukal et al., 2007). Yet, Experiment 2 found no relationship between resting…” L481”… postural force biases and active movement control. To further investigate this apparent…”  The methods of the two studies seem fairly similar, but this question warrants a more careful comparison. How did the size of the two workspaces compare? What about the magnitude of the exerted forces? The movement condition in this study was done with the limb entirely supported. Under that condition, the Sukal study also found fairly small effects of the range of motion.

      Response 1.A.6. Sukal et al., 2007 did not directly measure exerted forces, but instead compared the active range of motion under different loading conditions. They used the extent of reach area to quantify the effect of abnormal synergies, with a more extended active range of motion signifying reduced effect of abnormal synergies. As the Reviewer points out, Sukal et al. found fairly small effects of synergies upon the range of motion when arm support was provided (the reach area for the paretic side was found to be about 85% of the nonparetic side under full arm support, though they were statistically significantly different, Figure 5 of their paper). They found increasing effect of synergies as arm support was reduced: on average, the reach area when participants had to fully support the arm was less than 50% the reach area when full arm support was given (comparing the 0% vs. 100% active support conditions [i.e. 100% vs. 0% external support] in their Figure 5). As we discuss in our paper, this effect of arm support upon synergy mirrors the one we found for resting postures.

      To compare our workspace with the one in Sukal et al., we overlaid our workspace (the array of positions for which the posture biases were measured, for a typical participant from Experiment 1) on the one they used as shown in their Figure 4. Note that their figure only shows an example participant, and thus our ability to compare is limited by the fact that each participant can vary widely in terms of their impairment, and assumptions had to be made to prepare this overlay (e.g. that (0,0) represents the position of the right acromion point). 

      For this example, and our assumptions, our workspace was smaller, with the main points of interest (red dots, the movement start/end points used for Experiment 2) within the Sukal et al. workspace. That our workspace is smaller is not surprising, given that the area in Sukal et al. represents the limit of what can be reached, and thus motor control *has* to be examined in a subset of that area.

      Author response image 1.

      Comparing the two study methodologies, however, suggests an advantage of measuring resting biases in terms of sensitivity and granularity: first, resting biases can be clearly detected even under arm support (something we point out in our Discussion, lines 715-717); second, they can measure abnormalities at any point in the workspace, rather than a binary within/without the reach area. The resting bias approach may thus be a more potent tool to probe the shared bias/synergy mechanisms we propose here.

      Figure 2 

      Needs color code. 

      The red dots could be bigger.

      Response 1.A.7. We have increased the size of the red dots and added a color code to explain the levels illustrated by the contours. We also expanded our caption to better explain this illustration.

      Figure 3

      Labeling is confusing. Drop the colored words (from both A and B), and stick to the color legend. Consider using open and filled symbols (and bars) to represent arm support or lack thereof. The different colored ovals are very hard to distinguish.

      Response 1.A.8. We find these recommendations improve the readability of Figure 3 and we have thus adopted them - see updated Figure 3.

      Figure 4

      Not terribly necessary.  

      Response 1.A.9. While this figure is indeed redundant based our descriptions in the text, we kept it as we believe it can be useful in clarifying the different stages of movement we examine.

      Figure 5 

      Tiny blue and green arrows are impossible to distinguish. 

      Although the general idea is clear, E and H are not terribly intuitive.  Add distance scale bars for D-I. 

      Response 1.A.10. For improved contrast, we now use red and blue (also in line with comment below regarding Figure 7), and switched to brighter colors in general. To make E and H more intuitive and easier to follow, we expanded the on-panel legend. Thank you for pointing out that distance scale bars are missing; we have now added them (panels EFHI).

      Figure 6 

      Panel E inset is too small. 

      Response 1.A.11. We have now moved the inset to the right and enlarged it.

      Figure 7 

      Green and blue colors are not good. 

      Response 1.A.12. For improved contrast, we now use red and blue.

      Figure 8 

      Delete or move to supplement? 

      Response 1.A.13. We respectfully disagree. While the relationships on these data are also captured by the ANOVA, we believe these scatter plots offer a better overview of the relationships between force biases and FM-UE across different conditions.

      Really minor

      L113 “…participants' lower arm was supported using a custom-made air-sled (Figure 1C). Above the  participant's…” 

      Response 1.A.14. We put the apostrophe after the s so to refer to participants in general (plural).

      L117 ”…subject-produced forces on the handle were recorder using a 6-axis force transducer.”  recorded 

      Response 1.A.14. Thank you for pointing out this error which we have now corrected.

      L136 “…2013), Experiment 1 assessed resting postural forces by passively moving participants to>…”  The experiment did not move the participant. 

      Response 1.A.15. We now fix this issue: “by having the robot passively move…”

      L248 “…experiment blocks: two with each arm, with or without arm weight support (provided by an air experimental…”

      Response 1.A.16. We have now corrected this.

      L364 “…responses to mid-movement perturbations. In 1/3 of randomly selected reaching movements…”  Obviously, you mean 1/3 of all movements: "One-third of the reaching movements were chosen randomly"  

      Response 1.A.17. We now clarify: “In 1/3 of reaching movements in Experiment 2, chosen randomly”. Also please note our response to Reviewer 2, point 10: we now report the exact number of trials for which each kind of perturbation was present.

      L609 “Damage to the CST after stroke reduces its moderating influence upon the RST (Figure 9,…”  "its" refers to the subject, "Damage", not "CST".

      Response 1.A.18. We have changed this to “Post-stroke damage to the CST reduces the moderating influence the CST has upon the RST”.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout, the authors cleverly selected the most opposed and most aligned resting postural force biases to perform a within-subject analysis. However, this approach excludes a lot of data. The authors could perform an additional within-subject analysis. For each participant they could correlate lateral resting posture force bias to each dependent variable, utilizing all the trials of a participant. 

      Response 2.A.1a. Thank you for your appreciating our analysis design, and suggesting additional analyses. We focused our within-subject analysis design on the most extreme instances, as we believe that this approach would offer the best opportunity to detect any potential effects of resting biases. We reasoned that, since resting biases tend to be relatively small for most locations in the workspace, taking all biases into account would inject a disproportionate amount of noise in our analysis, which would in turn diminish our ability to detect any potential relationships. This could be because small biases lead to small effects but also small biases may themselves be more likely to reflect measurement noise in the first place. Note that our study talks about separability of active reaching from resting abnormalities based on lack of relationships between the two. While one cannot definitely prove a negative, it is also important to take the approach that maximizes the ability to detect any such relationship if there were one. We believe taking the most extreme instances fulfills that role.

      However, as the Reviewer points out, this approach also excludes a substantial amount of data. We agree that our findings could be further strengthened by exploring additional within-subject analyses that utilize all trials. Thus, following the reviewer’s suggestion, we estimated the sensitivity of each dependent variable to lateral resting posture force bias. Specifically, we estimated the slope of this relationship for each individual (separately for paretic and non-paretic data) using linear regression, and assessed whether the average slope is significant for each group (paretic data, non-paretic data, and control data).

      This secondary analysis replicated our main findings: lack of relationship between posture biases and active reaching control (both for unperturbed and perturbed movement), and a significant relationship between posture biases and active holding control. In addition, in line with main point 2.1 by the reviewer, we performed the same analyses for non-paretic and control data. While there are no definitive conclusions to be made for these cases (as was likely, given that the resting force biases are smaller, as also pointed out by the Reviewer in 2.1) these data are worthy of discussion, with potentially interesting insights (for example, there are hints that the connection between resting biases and active holding control is present in the non-paretic arm as well, and may be explored in future research).

      We have included these analyses in the supplementary materials, and we point to them in the main text. Specifically:

      First, in line with our main analyses in Figure 5, we find no effect (the average slope is insignificant) for start and endpoint biases upon the corresponding reaching angles. This is now mentioned in lines 425-434 of the Results, and illustrated in Figure S5-2. There was a lack of effect for the non-paretic and control data as well.

      Second, in line with our main analyses in Figure 6, we find no effect of start biases upon responses to the pulse (Figure S6-2, mentioned in lines 513-517 of the Results). As above, there was no effect of non-paretic or control data either.

      And, finally, in line with our main analysis in Figure 7, we find an effect of resting biases upon performance for the static perturbation (Figure S7-2, mentioned in lines 578-586 of the Results). Interestingly, there is a suggestion that resting biases may affect static perturbation responses in the non-paretic data as well based on the relationship between posture bias and maximum deviation, but not the other two metrics. Given the lack of consistency of resting bias effects for all three different dependent variables examined, however, our current data are thus unable to give a definite answer as to whether there is the connection between resting biases and active holding control is also present in the non-paretic side. Our hypothesis is that, since resting abnormalities and their effects are the pathological over-manifestations of mechanisms inherent in the motor system in general, then such a relationship would exist. Answering this question, however, would require an experiment design better tailored to detect relationships in the non-paretic arm, where resting biases are weaker.

      We thank the Reviewer for their suggestions and believe that these additional analyses provide a more complete picture of the data, and their consistency with our main results reinforces the message of the paper.

      Then, they can report the percentage of participants that display significant correlations separately for the paretic, nonparetic, and control arms. 

      Response 2.A.1b. We note that, even in cases where the average slope (across individuals) is significant, the individual slopes themselves are usually not significant, likely due to the large amount of noise for datapoints corresponding to weak resting biases. To further examine this, we performed additional analyses whereby we examined slopes by (a) pooling all participant data together (centered separately for each individual), and then (b) took a further step to normalize each participant’s data not only by centering but by also adjusting by each individual’s variability along each axis (i.e. assess the slope between z-scores of resting bias vs. z-scores of each dependent variable). These two analyses confirmed our finding that resting biases interacted with active motor control, with significant slopes between resting biases and outcome variables. (a) Pooling all data together: path to stabilization: p = 0.032; time to stabilization: p = 1.4x10-5; maximum deviation: p = 0.021. (b) Pooling and normalizing: path to stabilization: p = 0.0013; time to stabilization: p = 8.6x10-6; maximum deviation: p = 0.00056. The latter analysis showed even stronger connection between resting bias and active holding control, probably due to better accounting for differences in the range of resting biases across participants). For simplicity, however, we only provide the across-individual slope comparisons in the paper.

      (2) An important aspect of all the analyses is that they rely heavily on estimates of the resting postural force bias. How stable are these resting postural force biases at the individual level? The authors could assess this by reporting within-subject variance for both the magnitude and direction of the resting postural force bias.

      Response 2.A.2. Thank you for your suggestion. We now assess the individual-level variance in error across measurements for patients’ paretic data using an ANOVA: the variance that remains after all other factors (same probe location; same arm support condition; same participant) are taken into account. We found that individual level measurement variance explained a mere 9.0% of total variance for resting bias magnitude. (We note that the same figure was 20.2% for the non-paretic data, in line with the weaker average biases which would be more susceptible to noise). We now note this in the Methods, as part of the new subsection “Stability of resting posture bias measurements in Experiment 1” (lines 266-273).

      (3) Does resting postural force bias influence hand movement immediately following force release from the postural perturbation? This could be assessed before any volitional responses by examining the velocity of the hand during the first 50 ms following the postural perturbation.

      Response 2.A.3. The influence seems fairly rapid, within the first 100ms as shown to the right. Here we plot hand deviation in the direction of the perturbation for the most-opposed (red) vs. most-aligned (blue) instances to examine when these curves become different. The bottom plots show the difference between these two, whereas shading indicates SEM (note that these curves are referenced to the average deviation in the last 0.5 s before force release). The rightmost plots zoom in to make it easier to see how responses to the most opposed vs. most aligned instances diverge.

      To detect the earliest post-perturbation timepoint for which this effect was significant, we performed paired t-tests at each timestep, and found that the two responses were systematically statistically different 95ms after perturbation onset onwards. For reference, the same method detected a response at 25ms for the most aligned instances and 40ms for the most opposed instances.

      We have now added Supplementary Figure S7-4 with short commentary in the Supplementary Materials.

      (4) Abstract. lines 7-9. At a glance (and when reading the manuscript linearly) this sentence is unclear. If the paretic arm is compromised across rest and movement, how does that afford the opportunity to address the relationship between reaching, stopping, and stabilizing when all could be impacted? It might be useful to specify that these factors may impacted differently relative to one another with stroke, providing an opportunity to better understand the differences between movement and postural control. 

      Response 2.A.4. Thank you for pointing out this issue (also related to Reviewer 1’s point – Response 1.1). We have changed this to more clearly reflect our reasoning and highlight that the issue is that stroke can differentially impact reaching vs. holding, copied below:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke.”

      (5) Line 27. It is perhaps more appropriate to say conceptual model than simply 'model'.  

      Response 2.A.5. Thank you for your suggestion, which we have adopted throughout the manuscript.

      (6) Line 122-125. Figure 1A caption. The authors should specify that resting posture force biases occur when the limb or hand is physically constrained in a specific position. 

      Response 2.A.6. Thank you for pointing this out – we have clarified the caption:

      “If one were to physically constrain the hand in a position away from the resting posture, the torques involved in each component of the abnormal resting posture translate to a force on the hand (blue arrow);”

      (7) Line 147. Why was the order not randomized or counterbalanced? 

      Response 2.A.7. We prioritized paretic data, as the primary analyses and comparisons in our paper involved resting posture biases and active movement with the paretic arm. We note that our primary analyses, which rely on paretic-paretic comparisons, would not be affected by paretic vs. non-paretic ordering effects. However, ordering effects could potentially affect comparisons between paretic and non-paretic data. We now note the reasoning behind the absence of counterbalancing, and mention the potential limitation in interpreting paretic to non-paretic comparisons in lines 124-129 of the Methods.

      (8) Line 172. 12N is the peak force of the pulse?

      Response 2.A.8. The reviewer is correct; we have clarified our description (line 463 in the updated manuscript):

      “a 70 ms bell-shaped force pulse which was 12N at its peak”

      (9) Line 175. What is a clockwise pulse? Was the force vector rotating in direction over time so that it was always acting orthogonally to the movement, or did it always act leftwards or rightwards?

      Response 2.A.9. The force vector was not rotating in direction over time. Here, we used clockwise/counterclockwise to indicate rightwards/leftwards with respect to the ideal movement direction – the line from start position to target (which is what we understand the Reviewer means by “always act rightwards or leftwards”). We have clarified the text to indicate this (lines 193-195):

      …was applied by the robot lateral to the ideal movement direction (i.e. the direction formed between the center of the start position and the center of the target) after participants reached 2cm away from the starting position (Smith and Shadmehr, 2005; Fine and Thoroughman, 2006).

      (10) Lines 177-182. It might be useful to explicitly mention the frequency of each of the perturbations, just for ease of the reader. 

      Response 2.A.10. We have added this information to our Methods (lines 206-210):

      Thus, in summary, each 96-movement block consisted of 64 unperturbed movements and 32 movements perturbed with a force pulse (16 clockwise, and 16 counter-clockwise). For 20 out of the 96 movements in each block, the hold period was extended to test the hold perturbation (4 trials for each of the 5 target locations, each one of the 4 trials testing one perturbation direction as shown in Figure 7C).

      (11) Line 191. Lines 188-190. It would be useful to see a sample of several of these force traces over time (0-5s) that were used to make the average for a position. That would give insight into the stability of the forces of a participant for one of the postures. These traces could be shown in Figure 2.

      Response 2.A.11. Thank you for your suggestion. We have added these panels to Figure 1, (as Figure 2 was already large). Each panel illustrates the three measurements taken at similar positions (closest to midline, distal from the body) and the same condition (paretic arm, with arm support given) for one participant (same participants as in Figure 2). Solid lines indicate the force on the x-axis (positive values indicate forces towards the left), whereas dashed lines indicate the force on the y-axis (positive values indicate forces towards the body). The shaded area indicates the part averaged in order to estimate the resting bias, illustrating how resting biases were relatively stable by the 2s mark. Note that these examples include one trial (blue traces in the third panel) which was rejected following visual inspection as described in Materials and Methods – Data Exclusion Criteria (“trials where forces appeared unstable and/or there was movement during the robot hold period”). We find this helpful as this illustrates (and motivates) one component of our methodology. 

      (12) Line 196. Figure 1D (not 1E).  

      Response 2.A.12. Thank you for catching this error, which we have now corrected.

      (13) Line 215: The authors mentioned similar results. Were there any different results that impacted interpretation? Some evidence of this, similar to and in addition to Supplementary 1, would be helpful. 

      Response 2.A.13. We repeated our analyses without these exclusion criteria, with no impact to the interpretation. We now include versions of the main outcome panels from Figures 5, 6, and 7 in the supplementary materials calculated without this outlier exclusion (Figures S5-E, S6-E, and S7-E, respectively). 

      (14) Line 231: Perhaps better to explicitly state the furthest three positions are being across as the distal targets for the ANOVA. 

      Response 2.A.14. Thank you for your suggestion. We now explicitly clarify this in line 276:

      “distal targets [furthest three positions] vs. proximal targets [closest two positions]”

      (15) Figure 3B, lines 265. Clearly, these are different, but the authors should report statistics. 

      Response 2.A.15. We now report these numbers (lines 339-346 of the revised manuscript, which also include statistics related to bias direction as described in 2.A.17 below).

      (16) Figure 2 should have a heat map scale.  

      Response 2.A.16. We have now added this (also Response 1.A.7), including an explanation of what the heat map represents in the caption.

      (17) Figure 3C: It would be useful to quantify and plot the direction of the resting force bias vector. 

      Response 2.A.17. Thank you for your suggestion. We have expanded Figure 3 to include the average direction of the resting force bias vector (note the readjustment of colors following Reviewer 1’s comment: striped bars indicate No Support data, and full bars indicate Support data, with the colors being the same). The direction of the force bias vector, however, may not be very informative in cases where the magnitude is small (and the signal-to-noise ratio is small), whereas averaging the direction of the force bias vector across different positions for one participant may average out systematic variations in this direction across different locations. Nevertheless, the average direction appears generally towards the body (around -90°, or 6 o’clock) even in the non-paretic and control data (though the noise – as suggested by the size of the errorbars – is much higher in the latter cases, especially when the arm is supported). This is a (weak) suggestion that these resting biases may be present, though much subdued, in the nonparetic limb and healthy individuals; further work will be needed to elucidate this.

      (18) Line 428. It is not significantly longer compared to controls. Can the authors slightly revise this sentence?

      Response 2.A.18. We have revised this sentence (lines 529-532):

      Patients showed impaired capacity to resist and recover from this perturbation (the abrupt release of the imposed force). The time to stabilization for the paretic side (0.94±0.05s) was longer compared to the non-paretic side (0.79±0.03s, p = 0.024) and controls (0.78±0.06s, though this was statistically marginal, p = 0.061) as shown in Figure 7E, left.

      (19) Line 541. It is unclear how these data support the idea of three distinct controllers. Can the authors please clarify? 

      Response 2.A.19. Here, we compared our findings to previous ideas about distinct controllers, and discuss a potential fusion of these ideas with ours. Specifically, we find that holding is distinct from both initial reaching and coming to a stop. Previous work argues that initial reaching and coming to a stop are themselves distinct (Ghez et al., 2007; Jayasinghe et al., 2022). Combining these two sets of arguments, we arrive at the possibility of three distinct controllers. 

      (20) It would be useful if the authors provided a definition of synergy, as well as distinguishing between muscle and movement synergies. 

      Response 2.A.20. We now provide this in lines 591-594:

      Here, “synergies” refer to abnormal co-activation patterns across joints that manifest as the patient tries to move – for example, the elbow involuntarily flexing as the patient tries to abduct their shoulder (Twitchell, 1951; Brunnstrom, 1966). 

      (21) Line 592-593. The wording of this sentence could be improved. 

      Response 2.A.21. We have switched this sentence to active voice for more clarity:

      Thus, while full weight support reduces both resting flexor biases and movement-related flexor synergies, this reduction seems more complete for synergies rather than resting biases.

      (22) Figure 9. In the left column, it should read normal synergies and normal resting posture.  

      Response 2.A.22. We intentionally used the same terminology, as the idea behind our conceptual model is that these patterns, which manifest as well-recognized abnormal synergies and abnormal resting postures in stroke, may be present in the healthy motor system as well, but kept in check by CST moderating the RST. At the same time, we recognize that, by definition, synergies and posture in controls are the “normal” reference point against which “abnormal” synergies and posture are defined after stroke. To clarify this issue, we thus decided to forgo the use of the terms “abnormal” in the figure, and instead refer to “synergistic movement ” and “synergistic resting posture”.

      (23) Figure 9. With stroke, is RST upregulated, a decreased influence of CST, or both? All seem plausible.

      Response 2.A.23a. We believe both can be happening. From previous work (e.g. McPherson et al., 2018) it seems safe to say that RST upregulation is the case, whereas one would also expect a decreased CST influence due to its damage due to the stroke. The relative weight of these influences would be interesting to elucidate in future work.

      I have not read the paper, but did McPherson et al., 2018 test these different hypotheses?  

      Response 2.A.23b. The main point of McPherson et al., 2018 is that increased synergy expression is due to increased RST involvement, rather than reduced CST influence. However, McPherson et al. do not show separate increases/reductions in RST/CST activity; they show that contralesional activity relative to ipsilesional activity is increased (using a laterality index). While it does seem that RST is upregulated in this case, this does not exclude the possibility that CST influence is reduced as well.

      We also noticed that the citation itself, while mentioned in the text, was missing from the bibliography. This is now fixed.

      For Figure 9, McPherson is cited as they provide evidence for the idea that RST involvement increases when arm support is decreased. This evidence is both direct (e.g. in their Figure 3 where they show that “Stroke participants exhibited increased activity in the contralesional (R) hemisphere as SABD loading increased” [i.e. arm support was reduced]) and indirect: they connect synergies to RST involvement, and also show increased synergies with reduced arm support (also shown multiple times previously). Both these arguments suggest that arm support reduces RST involvement. We have clarified the relevant sentence:

      The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed. Supporting this idea, McPherson et al. found increased ipsilateral activity (which primarily represents activation via the descending RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).

      Reviewer #3 (Recommendations For The Authors):

      For Experiment 2, it is not immediately clear how the within-subject values are being pooled and compared across the different conditions. For instance, in the static perturbation trials, there are four blocks with 20 perturbation trials per block per arm (80 total per arm) with each location and direction once per block. For each participant, the comparison is between the location/direction that was most opposed (although this doesn't look accurately represented in Fig 7F). Therefore, the within-subject comparison is 4 trials per participant? Were these values averaged or pooled? It is a little odd that the SD for all the within-subjects trials are identical or nearly identical across conditions especially when looking at the example patient data in 7B and 7F.  

      Response 3.A.1. For static perturbation trials, the within-subject comparison involves 8 trials per participant: 4 trials corresponding to the perturbation direction/position combination with resting bias most opposed to the perturbation, and 4 trials corresponding to the perturbation direction/position combination with resting bias most aligned with the perturbation. These values were averaged for each individual. We have expanded our methods to make this part of our data analysis clear (lines 284-296) for all types of comparisons (unperturbed movement, pulse perturbation, static perturbations – now referred to as “release perturbation”).

      The across-subject SDs for the average resting forces for each one of these two conditions, shown in Figure 7F are indeed identical. This is due to how these two instances (most aligned vs. most resistive) were selected: because the perturbation directions come in pairs that exactly oppose each other (Figure 7B), if one were to select the position with the most opposing resting bias, that would mean that the combination with same position and the oppositely-directed perturbation would be the one with the most assistive resting bias. Hence the resting biases selected for the most opposing/assistive instances would be equal in magnitude and opposite to each other for each participant, as illustrated in Figure 7F, whereby the most-opposed bias for each individual is exactly opposite to the corresponding most-aligned bias for the same individual. We have added a brief commentary about this on the caption (lines 551-554), reproduced below:

      Note how the most-opposed resting bias for each patient is equal and opposite to the their mostaligned resting bias. This is because the same resting bias, when projected along the direction of two oppositely-directed perturbations (illustrated in C), it would oppose one with the same magnitude it would align with the other.

      Importantly, following suggestions by Reviewer 2 (see point 2.A.1), we now provide supplementary analyses that use the entirety of the relevant data, rather than the most extreme instances, which provide evidence supporting our main findings (Figures S5-2, S6-2, and S7-2).

      The printed colors in Figure 3 are very muddled and hard to read/interpret, especially in panel A. 

      Response 3.A.2. Thank you for pointing out this issue, also raised by Reviewer 1. We have adjusted the colors to be more distinct from each other and look clear both in print and on-screen, making use of dashed lines and stripes rather than different shades.

      I think it would improve readability and interpretation if Figure 8 and the results related to FM-UE were contained within the description of results for Experiment 1.

      Response 3.A.3. Thank you for this suggestion. This is actually a debate we had among ourselves earlier, and we can see merits to either ordering. It is very arguable that moving Figure 8 and the FMUE results within the rest of Experiment 1 may improve readability somewhat. However, we believe that presenting these results at the end better serves to illustrate the apparent paradox between the lack of direct connection between resting biases and active movement on one hand, and the relationship between resting biases and abnormal synergies on the other. We believe that this better sets the stage to present our conceptual model, which explains this paradox based on the role arm support plays in modulating the expression of both resting biases and abnormal synergies.

      Additional changes/corrections not outlined above

      Figure 1D displayed a right arm, but showed a target array (red dots) for a left arm paradigm. We now flip the target array shown for consistency.

      We corrected Figure 6C, which accidentally used an earlier definition of settling time which was based on lateral stabilization throughout the entire movement, rather focus on the period immediately following the pulse. The intended definition of settling time (as we had described in the Methods, lines 204-206 of original submission) focuses on lateral corrections specific to the pulse (rather than corrections when the participant approaches the endpoint) and better matches the one for settling time for the release (static) perturbation trials. Note that this change did not affect the (lack of) relationship between settling time and resting force bias, both across individuals (correlation plots now in Figure S6-1) and within individuals (now shown in the right part of panel 6D). Also in panel C, an error in the scaling for the maximum lateral deviation in the pulse direction (right side of the panel) is also now corrected.

      In addition, we made minor edits throughout the text to improve readability.

      References

      Albert ST, Hadjiosif AM, Jang J, Zimnik AJ, Soteropoulos DS, Baker SN, Churchland MM, Krakauer JW, Shadmehr R (2020) Postural control of arm and fingers through integration of movement commands. Elife 9:e52507.

      Avni I, Arac A, Binyamin-Netser R, Kramer S, Krakauer JW, Shmuelof L (2024) The Kinematics of 3D Arm Movements in Sub-Acute Stroke: Impaired Inter-Joint Coordination is Attributable to Both Weakness and Flexor Synergy Intrusion. Neurorehabil Neural Repair 38:646–658.

      Bourbonnais D, VANDEN NOVEN S, Carey KM, Rymer WZ (1989) Abnormal spatial patterns of elbow muscle activation in hemiparetic human subjects. Brain 112:85–102.

      Brunnstrom S (1966) Motor testing procedures in hemiplegia: based on sequential recovery stages. Phys Ther 46:357–375.

      Cortes JC, Goldsmith J, Harran MD, Xu J, Kim N, Schambra HM, Luft AR, Celnik P, Krakauer JW,

      Kitago T (2017) A Short and Distinct Time Window for Recovery of Arm Motor Control Early After Stroke Revealed With a Global Measure of Trajectory Kinematics. Neurorehabil Neural Repair 31:552–560.

      Duque J, Thonnard J, Vandermeeren Y, Sébire G, Cosnard G, Olivier E (2003) Correlation between impaired dexterity and corticospinal tract dysgenesis in congenital hemiplegia. Brain 126:732–747.

      Fine MS, Thoroughman KA (2006) Motor Adaptation to Single Force Pulses: Sensitive to Direction but Insensitive to Within-Movement Pulse Placement and Magnitude. J Neurophysiol 96:710–720.

      Ghez C, Scheidt R, Heijink H (2007) Different Learned Coordinate Frames for Planning Trajectories and Final Positions in Reaching. J Neurophysiol 98:3614–3626.

      Hadjiosif AM, Branscheidt M, Anaya MA, Runnalls KD, Keller J, Bastian AJ, Celnik PA, Krakauer JW (2022) Dissociation between abnormal motor synergies and impaired reaching dexterity after stroke. J Neurophysiol 127:856–868.

      Jayasinghe SA, Scheidt RA, Sainburg RL (2022) Neural Control of Stopping and Stabilizing the Arm. Front Integr Neurosci 16.

      Kanade-Mehta P, Bengtson M, Stoeckmann T, McGuire J, Ghez C, Scheidt RA (2023) Spatial mapping of posture-dependent resistance to passive displacement of the hypertonic arm post-stroke. J NeuroEngineering Rehabil 20:163.

      Lawrence DG, Kuypers HG (1968) The functional organization of the motor system in the monkey: II. The effects of lesions of the descending brain-stem pathways. Brain 91:15–36.

      Levin MF (1996) Interjoint coordination during pointing movements is disrupted in spastic hemiparesis. Brain 119:281–293.

      Lowrey CR, Bourke TC, Bagg SD, Dukelow SP, Scott SH (2019) A postural unloading task to assess fast corrective responses in the upper limb following stroke. J NeuroEngineering Rehabil 16:1–17.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225.

      McPherson LM, Dewald JP (2022) Abnormal synergies and associated reactions post-hemiparetic stroke reflect muscle activation patterns of brainstem motor pathways. Front Neurol 13:934670.

      Porter R, Lemon R (1995) Corticospinal function and voluntary movement. Oxford University Press.

      Smith MA, Brandt J, Shadmehr R (2000) Motor disorder in Huntington’s disease begins as a dysfunction in error feedback control. Nature 403:544.

      Smith MA, Shadmehr R (2005) Intact ability to learn internal models of arm dynamics in Huntington’s disease but not cerebellar degeneration. J Neurophysiol 93:2809–2821.

      Tower SS (1940) Pyramidal lesion in the monkey. Brain 63:36–90.

      Twitchell TE (1951) The restoration of motor function following hemiplegia in man. Brain 74:443–480.

      Wilkins KB, Yao J, Owen M, Karbasforoushan H, Carmona C, Dewald JP (2020) Limited capacity for ipsilateral secondary motor areas to support hand function post-stroke. J Physiol 598:2153– 2167.

      Zaaimi B, Edgley SA, Soteropoulos DS, Baker SN (2012) Changes in descending motor pathway connectivity after corticospinal tract lesion in macaque monkey. Brain 135:2277–2289.

    1. Author response:

      eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

      We appreciate the opportunity to address the constructive feedback provided by eLife and the reviewers. Below, we respond to the overall assessment and individual reviewers' comments, clarifying our experimental approach, addressing concerns, and providing additional details where necessary.

      We thank the editors for highlighting the significance of our findings regarding the effects of Trio variant heterozygosity in mice. We acknowledge the feedback concerning the experimental flow and agree that clarity is paramount. To address these concerns:

      (1) Connections between variables: We will revise the manuscript to explicitly outline and extend explanations and the relationships between sex, age, behavior, brain regions, and synaptic measures, ensuring that the rationale for each experiment and its relevance to the overall conclusions are improved.

      (2) Methodological details: Our paper Methods section was formatted to be short with additional details provided in the Supplemental Methods section.  We will merge all into an extended section to improve clarity. We will also expand on our breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section. These additions aim to enhance the transparency and reproducibility of our study and to ensure full support of our conclusions.

      (3) Experimental flow: We will revise and extend our results, methods, and discussion sections to clarify the rationale and experimental design to guide readers through the experimental sequence and rationale.

      We are confident these revisions address the concerns raised and enhance the robustness and coherence of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

      We appreciate the reviewer’s recognition of the novelty and comprehensiveness of our approach, particularly the generation of multiple mouse lines and our efforts to model Trio variant effects in vivo.

      Weaknesses

      (1) Experimental flow and rationale and connection between variables: We will expand on the connections between behavioral data, neuronal morphology, synaptic function, and proteomics in the Results and Discussion sections to clarify how each experiment informs the reasoning and the conclusions and to highlight the relationships between sex, age, behavior, and synaptic measures.

      (2) Methodological details: Our paper Methods section was formatted to be short to fulfill word limits on the submitted version, with additional details provided in the Supplemental Methods section. We will merge our Methods and Supplemental Methods sections and expand on our breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section.  These additions aim to enhance the transparency and reproducibility of our study and to ensure full support of our conclusions.

      Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

      We are grateful for the reviewer’s recognition of our study’s significance and methodological rigor. The acknowledgment of Trio dysfunction as a novel and impactful area of research is deeply appreciated.

      Weaknesses: 

      We agree that focusing on older animals may limit insights into early-stage pathophysiology. However, given the goal of this study was to examine the functional impacts of Trio heterozygosity at an adolescent stage and to reveal the ultimate impact of these alleles on synaptic function, we believe the choice of animal age aligns with our objectives. We agree that future studies of earlier developmental stages will be beneficial and complement these findings.

      Functional compensation: In this study, we tested functional compensation through rescue experiments in +/K1431M brain slices using a Rac1-specific inhibitor, NSC, which prevents its activation by Trio or Tiam1. Our findings strongly suggest that increased Rac1 activity, attributed to the proposed compensation, drives the deficiency in neurotransmitter release. Furthermore, this deficiency can be normalized by direct Rac1 inhibition.

      Data not shown: We will incorporate all previously shown data into the Supplemental Materials, even when results are nonsignificant. We agree that this ensures full transparency and facilitates a more comprehensive evaluation of our findings.

    1. Author response:

      We thank the editors at eLife and the reviewers for the care with which our mansucript has been reviewed and the constructive feedback that we have received. Both reviewers viewed the manuscript positively and in particular praised the merits of the forward genetic screen that led to the discovery of a new link between the HIF-1 pathway and fatty acid desaturation.

      We agree with all points by Reviewer #1. We will modify our manuscript to clarify that two types of C18:1 fatty acids are present in our lipidomics, and that the majority is likely vaccenic acid that is not a FAT-2 substrate. The title will be modified and Fig. 1A corrected.

      All points raised by Reviewer #2 are also valid and we will try to address most of them experimentally, though not always as suggested. In particular, we plan to use FRAP to verify that membrane-fluidizing treatments are effective in the fat-2 mutant. We also plan to use qPCR to test whether the novel egl-9(lof) and hif-1(gof) alleles lead to the expected downregulation of ftn-2. We note that the pathway connecting EGL-9, HIF-1 and FTN-2 is well supported by published work and that the alleles isolated in our screen are consistent with it, with the addition that FAT-2 is likely a regulated outcome of FTN-2 inhibition/mutation. We also plan to monitor FAT-2 protein levels using Western blots and thus provide more clarity about the mechanism of action of the novel fat-2(wa17) suppressors. The manuscript will be modified to tone down interpretations not directly supported by experiments.

    1. Author response:

      We would like to thank the editors and reviewers for reviewing our work, for finding it valuable supported by convincing data, which we greatly appreciate, but also for identifying the weaknesses of the manuscript. We plan to address these weaknesses in the revised version, briefly as follows:

      (1) In the Discussion, we will elaborate more on a possible generalization of our results, while being aware of the limited space in this experimental paper and therefore intend to address this in more detail and comprehensively in a subsequent perspective article.

      (2) In the Discussion, we will more clearly address the limitations of our work, in particular the difference between the measurement of extracellular adenosine production ex vivo and the actual production in vivo, where the measurement is indeed very challenging, and also the limitations of manipulating the SAM pathway only at the Ahcy level.

      (3) We will describe in detail and complement the supplementary RNAseq data. The RNAseq data have already been described in detail in our previous paper (doi.org/10.1371/journal.pbio.3002299), but we agree with the reviewers that we should describe the necessary details again here.

      (4) We will fill in the missing data on encapsulation efficiency; we agree that it was unfortunate to omit them.

      (5) We will supplement the data with methyltransferase expressions and better describe the changes in expression of some SAM pathway genes, which, especially with methyltransferase expressions, also support stimulation of this pathway by changes in expression. Although the goal of this work was to test by 13C-labeling whether SAM pathway activity is upregulated, not to analyze how the activity is regulated, we certainly agree that an explanation of possible regulation, especially in the context of the enzyme expressions we show, should be included in our work.

    1. Author response:

      We thank the editors and reviewers for their comments on our manuscript. We found the comments of the reviewers helpful and plan to add new text, analyses, and figures to answer some of the outstanding questions.

      In response to the reviewers’ comments, we will clarify the goal of the paper in the introduction: to test the hypothesis that causal knowledge (i.e., an intuitive theory of biology) is embedded in domain-preferring semantic networks (i.e., semantic animacy network). This work links developmental psychology work on intuitive theories and cognitive neuroscience.

      As we will emphasize in the revised manuscript, the primary goal of the current paper is to test the claim that semantic networks encode causal knowledge, rather than to rule out the contribution of domain-general reasoning mechanisms to causal inference.

      In response to the reviewers’ suggestions, we will add multivariate and univariate whole-cortex analyses that provide further tests for domain-general causality responses. In particular, we will include new figures showing univariate responses to the mechanical inference condition over the non-causal control conditions as well as decoding between these conditions. The reviewers have also asked us to provide individual subject dispersion data. We appreciate this suggestion, and new figures will be added to display this information.

      We will also perform additional analysis in the precuneus (PC) to look for shared responses to illness and mechanical inferences. In accordance with our hypotheses, we have shown that the PC responds preferentially to illness inferences. To address the reviewers’ concerns about the selectivity of the PC to illness inferences, we will compare responses to i) illness inferences compared to the noncausal conditions and ii) mechanical inferences compared to the noncausal conditions in the PC to investigate the extent to which a shared response to causal inference across domains emerges in this region.

      Critically, we find that the cortical areas that distinguish between causal and non-causal conditions in a ‘domain general manner’ (i.e., for both illness and mechanical inferences) are driven by higher responses to the non-causal condition. Moreover, these responses in prefrontal cortex and elsewhere overlap an RT predictor of neural activity, suggesting that they may reflect difficulty effects.

      These results suggest that in the current task, signatures of causal inference are primarily found in domain-preferring semantic networks, rather than in domain-general fronto-parietal reasoning systems. We will provide additional discussion of the argument that the current results do not speak against the role of domain general systems across all types of causal reasoning. Instead, they suggest that the types of implicit causal inferences measured in the current study depend primarily on domain-preferring semantic networks.

      The reviewers have asked us to analyze responses to causal inferences about illness in the fusiform face area (FFA). We will perform this analysis. However, we note that univariate and multivariate whole-cortex analyses that are already included in the paper did not identify lateral ventral occipito-temporal cortex as a key region involved in causal inferences about illness. Further, we do not have FFA localizer data in the current participants; therefore, the results cannot be interpreted to reflect activity in functionally defined FFA.

      Two reviewers asked us to justify our choice of an implicit magic-detection task, which we will now do more clearly in the manuscript. This task was selected to ensure that participants were attending to the meaning of the vignettes. The goal of the current study was to investigate implicit causal inferences that routinely occur in language comprehension, e.g., when someone is reading a book. Past work has shown that explicitly judging the causality of causal and non-causal stimuli results in differences in response times across conditions (e.g., Kuperberg et al., 2006). In the current study, such judgments would also have introduced a confound between the behavioral decision and the condition of interest: the use of an explicit causal judgment task makes it impossible to know whether any observed neural differences between causal and non-causal conditions are simply due to differences in the selection of task responses. The selection of an orthogonal magic-detection task limits these confounds from complicating our interpretation of the neural data.

      One of the reviewers asked us to justify the number of catch trials that we decided to include in our paradigm. Approximately 20% of the vignettes were “magical” vignettes (the same proportion as each of the 4 experimental conditions) to encourage participants to remain attentive throughout the task. Since these catch trials are excluded from analysis, their proportion is unlikely to influence the results of the study. We will clarify this in the manuscript.

      A question was raised about the balance of trial numbers across conditions and across runs. To address this, we will include individual comparisons of each causal condition (n=36) with each non-causal condition (n=36; i.e., equal trial counts) where they are not already shown. With regard to runs, each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and is unlikely to impact the results. We will clarify this in the manuscript.

      We believe that our planned revisions will strengthen the paper and highlight its contributions to our understanding of the neural basis of implicit causal inference.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We agree with the reviewer’s suggestions. We propose to use RNA-seq using an orthogonal platform as a solution. This will allow us to answer multiple questions viz. validation of expression of human DNA in mouse cells, obtaining a detailed insight into genes and pathways driven by human cfChPs and enable us to identify chimeric human and mouse transcripts.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We agree with the reviewer’s suggestion. We propose to show horizontal transfer of cfChPs using four different cell-lines representing four different species.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the genomes of the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version. It is likely that cell death resulting from large scale HGT creates a vicious cycle of more cell death induced by cfChPs thereby helping to explain the massive daily turnover of cells in the body (10<sup>9</sup> – 10<sup>12</sup> cells per day).  

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We think this is a matter of semantics. We have used the term “function” since cfChPs that enter the cell are biologically active; they transcribe, translate, synthesize, proteins and proliferate. We, therefore feel that the term function is not inappropriate.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We take the reviewer’s point. We will replace the term “predatory genome” with a more neutral and factual term “supernumerary genome” in the title and throughout the manuscript in the revised version.

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      We propose to revise the “discussion” section taking into account the issues raised by the reviewer and highlight the potential role of cfChPs in evolution by acting as vehicles of transposable elements.  

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      Our responses to this paragraph are given in the two above sections.

    1. Author response:

      We thank the reviewers for their careful readings of our paper and their very positive assessment. Here we address the two major concerns they raised, referring to the revised version of the manuscript that will be submitted:

      (1) Important points were raised regarding the brief elongation events we reported. The time resolution and noise in our system reduce the accuracy of the burst velocity measurements. To address this, we have reached out to a colleague who is set up to repeat these measurements with microfluidics-assisted TIRF. The noise should be greatly reduced and the system is also optimal for directly visualizing labeled FHOD3, as suggested. We hope this experimental approach will provide new insights.

      In the meantime, we analyzed our data more closely. We were asked about the pauses we observe before bursts of elongation and how we know they are functionally relevant. The short answer is that we do not know. We reported them because they were so common:  in three independent experiments with wild type FHOD3L-CT we analyzed a total of 20 filaments. We detected 112 dim regions and 97 of these were pause/burst events (~87%). Among the cases lacking a pause we include instances of apparent "double bursts" with no time for capping in between (which may be a time resolution issue) and some cases where the burst was in progress when data collection started. In the latter case, we cannot determine whether or not a pause was missed. We cannot rule out that this pause reflects an interaction with the surface but might expect the frequency to be lower if it were. In fact, we did detect pauses in the profilin-actin negative control but only 4 pauses were detected across 21 filaments analyzed compared to 97 pauses observed in the presence of wild type FHOD3L across 20 filaments analyzed. We will revise the text to make our conclusions about pauses more circumspect.

      For comparison to our current data, we further analyzed the filaments in TIRF assays with no formin present. As the reviewers point out, inhomogeneities in filament intensity are normal. Thus, we examined any dim spots for pauses and/or bursts. We will report (future Figure 2G) that the velocity of growth of these dim spots was the same as the velocity of the rest of the filament. While our numbers may not be perfectly accurate due to the noise in our system, the difference of 3-4 fold increase versus no detectable change in rate is substantial and statistically different. In addition, we determined the number of dim spots per length of filament. We found a higher frequency of dim spots when FHOD3L-CT or FHOD3S-CT was present vs no formin, as will be shown in Figure 2 – figure supplement 1G and 2D.

      We are convinced that the brief dim events we observed in the presence of FHOD3L-CT do, in fact, reflect formin-mediated elongation and hope that the reviewers concur. This does not preclude our interest in the microfluidics and two-color assays, which we will pursue in the future.

      (2) The reviewers were concerned about the low protein levels in the GS-FH1 rescue experiments as reflected in the HA fluorescence intensity distributions shown in Fig. 5 – figure supplement 2A. While the scenario proposed could explain our observations with the GSFH1 rescues, it is quite complex and does not preclude the conclusion that the FH1 domain is critical. One limit of this scenario would be that the protein levels in the GS-FH1 cells reflect completely inactive protein, as opposed to FHOD3L that cannot elongate (by design). Given that the C-terminal half of the protein folds and functions and that the changes are made within an intrinsically disordered region, we do not favor this model. The reviewers suggest that the mutant protein detected in the few cells with (probably residual) sarcomeres could be stabilized, in part or entirely, by heterodimerization with residual endogenous wild type protein. We agree that heterodimerization is possible. The question becomes, how active is a heterodimer? If heterodimers have any activity, it seems far from sufficient to rescue sarcomere formation, suggesting that two functional FH1 domains are critical. To confirm this possibility, we would have to be able to determine whether the few sarcomeres present in these cases are residual and/or the new sarcomeres the low level of heterodimers could make. That said, we do not see evidence of correlation between protein levels and rescue at the level present in these cells (addressed below). Unfortunately, the proposed IP to test whether FHOD3L binds actin in vivo would only potentially report on filament side binding (both direct and indirect). It would not address whether the GS-FH1 mutant functions as a nucleator, elongator, bundler and/or capping protein in vivo.

      If we assume that the protein present is active, the critical question that we can address is whether the phenotype is due to low protein levels or if the phenotype is due to loss of elongation activity by FHOD3L. To address this question, we revisited our data.

      First, we plotted the distributions of the intensities of the cells we analyzed further, in addition to the automated readout of all the cells in the dish we originally presented (e.g. Fig. 4 – figure supplement 2A,B). These cells were selected randomly and, as should be the case, the distributions of their intensities agree well with the original distributions for the three different rescue constructs: FHOD3L, K1193L, and GS-FH1 (Fig. 6 – figure supplement 1A,B). We then asked whether there was any correlation between HA intensities with the sarcomere metrics. Consistent with in our pilot data, no correlation is evident in any of the three cases across the range of intensities we collected (400 – 2700 a.u.) (Fig. 6 – figure supplement 1C,D,E). We were originally satisfied with the GS-FH1 data, despite the low average intensity levels, because the intensities were well within the range that we established in pilot studies. These data reconfirm that the intensity levels are reasonable in a larger study.

      To more specifically address the question of whether low HA fluorescence intensity is likely to reflect sufficient protein levels to build sarcomeres, we re-examined two data sets from the FHOD3L WT rescue data. We found that, by chance, the first replicate of data from the wild type rescue has a comparable intensity distribution to that of the GSFH1 rescues (580 +/- 261 / cell vs. 548 +/- 105 / cell). In addition, we collected all of the data from cells with intensity levels <720, selected to mimic the distribution of the GS-FH1 cells (Fig. 6 – figure supplement 3A). We then compared the sarcomere metrics (sarcomere number, sarcomere length, sarcomere width) between the full data set and the two low intensity subsets using statistical tests as reported for the rest of the cell biology data set:

      · Sarcomere number is the only non-normal metric. We therefore used the Mann Whitney U test for each pairwise comparison, which shows no difference between all 3 WT distributions.

      · We compared Z-line lengths by Student’s two-sample, unpaired t-test for each pairwise comparison, again finding no significant difference for all distributions.

      · Sarcomere length shows a weakly significant difference (p=0.017 (compared to 0.033 for 3 treatment groups based on Bonferroni correction)) between the whole WT data set and bio rep 1, but no difference between the whole WT data set and the HA<720 group via Student’s two-sample, unpaired t-test.

      An alternate statistical analysis approach, one-way ANOVA and Tukey post hoc tests, gave similar results. Thus, cells expressing wild type FHOD3L at levels comparable to levels detected in GS-FH1 mutant rescues, are fully rescued. Based on these findings we conclude that the expression levels in the GS-FH1 are high enough to rescue the FHOD3 knock down, supporting our conclusion that the defect is due to loss of elongation activity. We will add this analysis and discussion to the revised manuscript.

      In future studies we will design less severe mutations to the FH1 domain. We hope to identify one with a strong effect on elongation and another with an intermediate effect. Once the best candidates are characterized in vitro, we will test them in our rescue experiments. If the strong mutant mimics the GS-FH1 rescue and the intermediate mutant is less severe, we will have strengthened our conclusion that elongation is a critical FHOD3L activity in sarcomere formation.

      Additional improvements will be made to the manuscript based on recommendations we received from the reviewers.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes," Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. By inducing oxidative damage, they block dsRNA import into cells, which affects heritable gene regulation in the adult germline (Fig. 2). They identify a novel gene, sid-1-dependent gene-1 (sdg-1), upregulated upon SID-1 inhibition (Fig. 3). Both transient and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). Interestingly, while sdg-1 expression suggests a potential role in dsRNA transport, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing in the germline (Fig. 7).

      Strengths:

      • The authors employ a robust neuronal stress model to systematically explore SID-1 dependent intergenerational dsRNA transport in C. elegans.

      • They discover two novel SID-1-dependent genes, sdg-1 and sdg-2.

      • The manuscript is well-written and addresses the compelling topic of dsRNA signaling in C. elegans.

      Weaknesses:

      • The molecular mechanism downstream of SDG-1 remains unclear. Testing whether sdg-2 functions redundantly with sdg-1could provide further insights.

      • SDG-1 dependent genes in other nematodes remain unknown.

      We thank the reviewer for highlighting the strengths of the work along with a couple of the interesting future directions inspired by the reported discoveries. The restricted presence of genes encoding SDG-1 and its paralogs within retrotransposons suggests intriguing evolutionary roles for these proteins. Future work could examine whether such fast-evolving or newly evolved proteins with potential roles in RNA regulation are more broadly associated with retrotransposons. Multiple SID-1-dependent proteins (including SDG-1 and SDG-2) could act together to mediate downstream effects. This possibility can be tested using combinatorial knockouts and overexpression strains. Both future directions have the potential to illuminate the evolutionarily selected roles of dsRNA-mediated signaling through SID-1, which remain a mystery.

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double-stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double-stranded target RNA is observed to repress target gene expression in the germline. Exogenous short or long double-stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with germ granules, intracellular sites for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      Strengths:

      (1) Very cleaver molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      Weaknesses:

      (1) The manuscript is broad, making it challenging to read and consider the data presented. Of note, since the original submission, the authors have improved the clarity of the writing and presentation.

      Comments on revised version:

      This reviewer thanks the authors for their efforts in revising the manuscript. In their rebuttal, the authors acknowledged the broad scope of their manuscript. I concur. While I still think the manuscript is a challenge to read due to its expansive nature, the current draft is substantially improved when compared to the previous one. This work will contribute to our general knowledge of RNA biology, small RNA regulatory pathways, and RNA inheritance.

      We thank the reviewer for highlighting the strengths of the manuscript and for helping us improve the presentation of our results and discussion.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes" Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. They induce oxidative damage in worms, blocking dsRNA import into cells (and potentially affecting the worms in other ways). Oxidative stress inhibits dsRNA import and the associated heritable regulation of gene expression in the adult germline (Fig. 2). The authors identify a novel gene, sid-1-dependent gene-1 (sdg-1), which is induced upon inhibition of SID-1 (Fig. 3). Both transient inhibition and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). The expression of SDG-1 is variable, potentially indicating buffering regulation. While the expression of Sdg-1 could be consistent with a role in intergenerational transport of dsRNA, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing (Fig. 7) in the germline. It would be interesting to test if sdg-2 functions redundantly.

      In summary, the authors have identified a novel worm-specific protein (sdg-1) that is induced upon loss of dsRNA import via SID-1, but is not required to mediate SID-1 RNA regulatory effects.

      We thank the reviewer for highlighting our findings on SDG-1. We found that oxidative damage in neurons enhanced dsRNA transport into the germline and/or subsequent silencing.

      Remaining Questions:

      • The authors use an experimental system that induces oxidative damage specifically in neurons to release dsRNAs into the circulation. Would the same effect be observed if oxidative damage were induced in other cell types?

      It is possible that oxidative damage of other tissues using miniSOG (as demonstrated in Xu and Chisholm, 2016) could also enhance the release of dsRNA into the circulation from those tissues. However, future experiments would be needed to test this empirically because it is also possible that the release of dsRNA depends on physiological properties (e.g., the molecular machinery promoting specific secretion) that are particularly active in neurons. We chose to use neurons as the source of dsRNA because by expressing dsRNA in a variety of tissues, neurons appeared to be the most efficient at the export of dsRNA as measured using SID-1-dependent silencing in other tissues (Jose et al., PNAS, 2009).

      • Besides dsRNA, which other RNAs and cellular products (macromolecules and small signalling molecules) are released into the circulation that could affect the observed changes in germ cells?

      We do not yet know all the factors that could be released either in naive animals or upon oxidative damage of neurons that influence the uptake of dsRNA into other tissues. The dependence on SID-1 for the observed enhancement of silencing (Fig. 2) shows that dsRNA is necessary for silencing within the germline. Whether this import of dsRNA occurs in conjunction with other factors (e.g., the uptake of short dsRNA along with yolk into oocytes (Marré et al., PNAS, 2016)) before silencing within the germline will require further study. A possible approach could be the isolation of extracellular fluid (Banse and Hunter, J Vis Exp., 2012) followed by characterization of its contents. However, the limited material available using this approach and the difficulty in avoiding contamination from cellular damage by the needle used for isolating the material make it challenging.

      • SID-1 modifies RNA regulation within the germline (Fig. 7) and upregulates sdg-1 and sdg-2 (Fig. 5). However, SID-1's effects do not appear to be mediated via sdg-1. Testing the role of sdg-2 would be intriguing.

      We observe the accumulation of sdg-1 and sdg-2 RNA in two different mutants lacking SID-1, which led us to conservatively focus on the analysis of one of these proteins for this initial paper. We expect that more sensitive analyses of the RNA-seq data will likely reveal additional genes regulated by SID-1. With the ability to perform multiplexed genome-editing, we hope in future work to generate strains that have mutations in many SID-1-dependent genes to recapitulate the defects observed in sid-1(-) animals. Indeed, as surmised by the reviewer, we are focusing on sdg-2 as the first such SID-1-dependent gene to analyze using mutant combinations.

      • Are sdg-1 or sdg-2 conserved in other nematodes or potentially in other species?  appears to be encoded or captured by a retro-element in the C. elegans genome and exhibits stochastic expression in different isolates. Is this a recent adaptation in the C. elegans genome, or is it present in other nematodes? Does loss-of-function of sdg-1 or sdg-2 have any observable effect?

      Clear homologs of SDG-1 and SDG-2 are not detectable outside of C. elegans. Consistent with the location of the sdg-1 gene within a Cer9 retrotransposon that appears to have integrated only within the C. elegans genome, sequence conservation between the genomes of related species is only observed outside the region of the retrotransposon (see Author response image 1, screenshot from UCSC browser). There were no obvious defects detected in animals lacking sdg-1 (Fig. 7) or in animals lacking sdg-2 (data not shown). It is possible that further exploration of both mutants and mutant combinations lacking additional SID-1-dependent genes would reveal defects. We also plan to examine these mutants in sensitized genetic backgrounds where one or more members of the RNA silencing pathway have been compromised.

      Author response image 1.

      Clarification for Readability:

      To enhance readability and avoid misunderstandings, it is crucial to specify the model organism and its specific dsRNA pathways that are not conserved in vertebrates:

      We agree with the reviewer and thank the reviewer for the specific suggestions provided below. To take the spirit of the suggestion to heart we have instead changed the title of our paper to clearly signal that the entire study only uses C. elegans. We have titled the study ‘Intergenerational transport of double-stranded RNA in C. elegans can limit heritable epigenetic changes’

      • In the first sentence of the paragraph "Here, we dissect the intergenerational transport of extracellular dsRNA ...", the authors should specify "in the nematode C. elegans". Unlike vertebrates, which recognise dsRNA as a foreign threat, worms and other invertebrates pervasively use dsRNA for signalling. Additionally, worms, unlike vertebrates and insects, encode RNA-dependent RNA polymerases that generate dsRNA from ssRNA substrates, enabling amplification of small RNA production. Especially in dsRNA biology, specifying the model organism is essential to avoid confusion about potential effects in humans.

      We agree with most statements made by the reviewer, although whether dsRNA is exclusively recognized as a foreign threat by all vertebrates of all stages remains controversial. Our changed title now eliminates all ambiguity regarding the organism used in the study.

      • Similarly, the authors should specify "in C. elegans" in the sentence "Therefore, we propose that the import of extracellular dsRNA into the germline tunes intracellular pathways that cause heritable RNA silencing." This is important because C. elegans small RNA pathways differ significantly from those in other organisms, particularly in the PIWI-interacting RNA (piRNA) pathways, which depend on dsRNA in C. elegans but uses ssRNA in vertebrates. Specification is crucial to prevent misinterpretation by the reader. It is well understood that mechanisms of transgenerational inheritance that operate in nematodes or plants are not conserved in mammals.

      The piRNAs of C. elegans are single-stranded but are encoded by numerous independent genes throughout the genome. The molecules used for transgenerational inheritance of epigenetic changes that have been identified thus far are indeed different in different organisms. However, the regulatory principles required for transgenerational inheritance are general (Jose, eLife, 2024). Nevertheless, we have modified the title to clearly state that the entire study is using C. elegans.  

      • The first sentence of the discussion, "Our analyses suggest a model for ...", would also benefit from specifying "in C. elegans". The same applies to the figure captions. Clarification of the model organism should be added to the first sentence, especially in Figure 1.

      With the clarification of the organism used in the title, we expect that all readers will be able to unambiguously interpret our results and the contexts where they apply. 

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double stranded target RNA is observed to repress target gene expression in the germline. Exogenous supply of short or long double stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to also be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with a Z-granule marker, an intracellular site for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      We thank the reviewer for highlighting our findings and underscoring the striking nature of the discovery that mutating sid-1 using genome-editing resulted in a transgenerational change that could not be reversed by changing the sid-1 sequence back to wild-type.

      Strengths:

      (1) Very clever molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      We thank the reviewer for the positive comments.

      Weaknesses:

      (1) As presented, the manuscript is incredibly broad, making it challenging to read and consider the data presented. This concern is exemplified in the model figure, that requires two diagrams to summarize the claims made by the manuscript.

      RNA interference (RNAi) by dsRNA is an organismal response where the delivery of dsRNA into the cytosol of some cell precedes the processing and ultimate silencing of the target gene within that cell. These two major steps are often not separately considered when explaining observations. Yet, the interpretation of every RNAi experiment is affected by both steps. To make the details that we have revealed in this work for both steps clearer, we presented the two models separated by scale - organismal vs. intracellular. We agree that this integrative manuscript appears very broad when the many different findings are each considered separately. The overall model revealed here forms the necessary foundation for the deep analysis of individual aspects in the future.

      (2) The large scope of the manuscript denies space to further probe some of the ideas proposed. The first part of the manuscript, particularly Figures 1 and 2, presents data that can be caused by multiple mechanisms, some of which the authors describe in the results but do not test further. Thus, portions of the results text come across as claims that are not supported by the data presented.

      We agree that one of the consequences of addressing the joint roles of transport and subsequent silencing during RNAi is that the scope of the manuscript appears large. We had suggested multiple interpretations for specific observations in keeping with the need for further work. To avoid any misunderstandings that our listing of possible interpretations be taken as claims by the reader, we have followed the instructions of the reviewer (see below) and moved some of the potential explanations we raised to the discussion section.

      (3) The manuscript focuses on the genetics of SDGs but not the proteins themselves. Few descriptions of the SDGs functions are provided nor is it clarified why only SDG-1 was pursued in imaging and genetic experiments. Additionally, the SDG-1 imaging experiments could use additional localization controls.

      We agree that more work on the SDG proteins will likely be informative, but are beyond the scope of this already expansive paper.  We began with the analysis of SDG-1 because it had the most support as a regulator of RNA silencing (Fig. 5f). Indeed, in other work (Lalit and Jose, bioRxiv, 2024), we find that AlphaFold 2 predicts the SDG-1 protein to be a regulator of RNA silencing that directly interacts with the dsRNA-editing enzyme ADR-2 and the endonuclease RDE-8. Furthermore, we expect that more sensitive analyses of the RNA-seq data are likely to reveal additional genes regulated by SID-1. Using multiplexed genome editing, we hope to generate mutant combinations lacking multiple sdg genes to reveal their function(s).

      We agree that given the recent discovery of many components of germ granules, our imaging data does not have sufficient resolution to discriminate between them. We have modified our statements and our model regarding the colocalization of SDG-1 with Z-granules to indicate that the overlapping enrichment of SDG-1 and ZNFX-1 in the perinuclear region is consistent with interactions with other nearby granule components.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) As presented, the manuscript is almost two manuscripts combined into one. This point is highlighted in Figure 7h, which basically presents two separate models. The key questions addressed in the manuscript starts at Figure 3. Figures 1 and 2 are interesting observations but require more experiments to define further. For example, as the Results text describes for Figure 1, "These differences in the entry of ingested dsRNA into cells and/or subsequent silencing could be driven by a variety of changes during development. These include changes in the uptake of dsRNA into the intestine, distribution of dsRNA to other tissues from the intestine, import of dsRNA into the germline, and availability of RNA silencing factors within the germline." Presenting these (reasonable) mechanistic ideas detracted from the heritable RNA epigenetic mechanism explored in the later portion of the manuscript. There are many ways to address this issue, one being moving Figures 1 and 2 to the Supplement to focus on SID-1 related pathways.

      Since this manuscript addresses the interaction between intercellular transport of dsRNA and heritable epigenetic changes, it was necessary to establish the possible route(s) that dsRNA could take to the germline before any inference could be made regarding heritable epigenetic changes. As suggested below (pt. 2), we have now moved the alternatives we enumerated as possible explanations for some experimental results (e.g., for the differences quoted here) to the discussion section.

      (2) The manuscript includes detailed potential interpretations in the Results, making them seem like claims. Here is an example:

      "Thus, one possibility suggested by these observations is that reduction of sdg-1 RNA via SID-1 alters the amount of SDG-1 protein, which could interact with components of germ granules to mediate RNA regulation within the germline of wild-type animals."

      This mechanism is a possibility, but placing these ideas in the citable results makes it seem like an overinterpretation of imaging data. This text and others should be in the Discussion, where speculation is encouraged. Results sections like this example and others should be moved to the discussion.

      We have rephrased motivating connections between experiments like the one quoted above and also moved such text to the discussion section wherever possible.

      (3) A paragraph describing the SDG proteins will be helpful. Homologs? Conserved protein domains? mRNA and/or protein expression pattern across worm, not just the germline? Conservation across Caenorhabditis sp? These descriptions may help establish context why SDG-1 localizes to Z-granules.

      We have now added information about the conservation of the sdg-1 gene in the manuscript. AlphaFold predicts domains with low confidence for the SDG-1 protein, consistent with the lack of conservation of this protein (AlphaFold requires multiple sequence alignments to predict confidently). In the adult animal, the SDG-1 protein was only detectable in the germline. Future work focused on SDG-1, SDG-2 and other SDG proteins will further examine possible expression in other tissues and functional domains if any. Unfortunately, in multiple attempts of single-molecule FISH experiments using probes against the sdg-1 open reading frame, we were unable to detect a specific signal above background (data not shown). Additional experiments are needed for the sensitive detection of sdg-1 expression outside the germline, if any.  

      (4) Based on the images shown, SDG-1 could be in other nearby granules, such as P granules or mutator foci. Additional imaging controls to rule out these granules/condensates will greatly strengthen the argument that SDG-1 protein localizes to Z-granules specifically.

      We have modified the final model to indicate that the perinuclear colocalization is with germ granules broadly and we agree that we do not have the resolution to claim that the observed overlap of SDG-1::mCherry with GFP::ZNFX-1 that we detect using Airyscan microscopy is specifically with Z granules. Our initial emphasis of Z-granule was based on the prior report of SDG-1 being co-immunoprecipitated with the Z-granule surface protein PID-2/ZSP-1. However, through other work predicting possible direct interactions using AlphaFold (Lalit and Jose, bioRxiv, 2024), we were unable to detect any direct interactions between PID-2 and SDG-1. Indeed, many additional granules have been recently reported (Chen et al., Nat. Commun., 2024; Huang et al., bioRxiv 2024), making it possible that SDG-1 has specific interactions with a component of one of the other granules (P, Z, M, S, E, or D) or adjacent P bodies.

      Minor

      (1) "This entry into the cytosol is distinct from and can follow the uptake of dsRNA into cells, which can rely on other receptors." Awkard sentence. Please revise.

      We have now revised this sentence to read “This entry into the cytosol is distinct from the uptake of dsRNA into cells, which can rely on other receptors”

      (2) Presumably, the dsRNA percent of the in vitro transcribed RNA is different than the 50 bp oligos that can be reliably annealed by heating and cooling. Other RNA secondary structure possibilities warrant further discussion.

      We agree that in vitro transcribed RNA could include a variety of undefined secondary structures in addition to dsRNAs of mixed length. Such structures could recruit or titrate away RNA-binding proteins in addition to the dsRNA structures engaging the canonical RNAi pathway, resulting in mixed mechanisms of silencing. Future work identifying such structures and exploring their impact on the efficacy of RNAi could be informative. We have now added these considerations to the discussion and thank the reviewer for highlighting these possibilities.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially interesting study regarding the role of gasdesmin D in experimental psoriasis. The study contains useful data from murine models of skin inflammation, however the main claims (on neutrophil pyroptosis) are incompletely supported in its current form and require additional experimental support to justify the conclusions made.

      We sincerely appreciate the positive assessment regarding the significance of our study, as well as the valuable suggestions provided by the reviewers. We have included new data, further discussions and clarifications in the revised manuscript to adequately address all the concerns raised by the reviewers and better support our conclusions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Liu, Jiang, Diao et.al. investigated the role of GSDMD in psoriasis-like skin inflammation in mice. The authors have used full-body GSDMD knock-out mice and Gsdm floxed mice crossed with the S100A8- Cre. In both mice, the deficiency of GSDMD ameliorated the skin phenotype induced by the imiquimod. The authors also analyzed RNA sequencing data from the psoriatic patients to show an elevated expression of GSDMD in the psoriatic skin.

      Overall, this is a potentially interesting study, however, the manuscript in its current format is not completely a novel study.

      Strengths:

      It has the potential to unravel the new role of neutrophils.

      Weaknesses:

      The main claims are only partially supported and have scope to improve

      We thank the reviewer for the positive evaluation of the interest and potential of our work. In response to reviewers’ suggestions, we have added new content, including additional data and discussions, to further demonstrate the important role of GSDMD-mediated neutrophil pyroptosis in the pathogenesis of psoriasis, thereby enhancing the completeness of our research.

      Reviewer #2 (Public review):

      Summary:

      The authors describe elevated GSDMD expression in psoriatic skin, and knock-out of GSDMD abrogates psoriasis-like inflammation.

      Strengths:

      The study is well conducted with transgenic mouse models. Using mouse-models with GSDMD knock-out showing abrogating inflammation, as well as GSDMD fl/fl mice without neutrophils having a reduced phenotype.

      I fear that some of the conclusions cannot be drawn by the suggested experiments. My major concern would be the involvement of other inflammasome and GSDMD bearing cell types, esp. Keratinocytes (KC), which could be an explanation why the experiments in Fig 4 still show inflammation.

      Weaknesses:

      The experiments do not entirely support the conclusions towards neutrophils.

      We appreciate the reviewers’ positive evaluation regarding the application of our mouse models. We also thank the reviewers for insightful comments and suggestions that can improve the quality of our work. Addressing these issues has significantly strengthened our conclusions. Our responses to the above questions are as follows.

      Specific questions/comments:

      Fig 1b: mainly in KC and Neutrophils?

      In Figure 1b, we observed that GSDMD expression is higher in the psoriasis patient tissues compared to control samples. As the role of GSDMD in keratinocytes during the pathogenesis of psoriasis has already been explored[1], we focused our study on GSDMD in neutrophils. In response to the comments, we have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). This addition further substantiates the expression of GSDMD in neutrophils within psoriasis tissue.

      Fig 2a: PASI includes erythema, scaling, thickness and area. Guess area could be trick, esp. in an artificial induced IMQ model (WT) vs. the knock-out mice.

      In our model, to accurately assess the disease condition in mice, we standardized the drug treatment area on the dorsal side (2*3 cm). Therefore, the area was not factored into the scoring process, and we have included a detailed description of this in the revised manuscript.

      Fig 2d: interesting finding. I thought that CASP-1 is cleaving GSDMD. Why would it be downregulated?

      Regarding the downregulation of CASP in GSDMD KO mouse skin tissue, existing studies indicate that GSDMD generates a feed-forward amplification cascade via the mitochondria-STING-Caspase axis [2]. We hypothesize that the absence of GSDMD attenuates STING signaling’s activation of Caspase.

      Line 313: as mentioned before (see Fig 1b). KC also show a stron GSDMD staining positivity and are known producers of IL-1b and inflammasome activation. Guess here the relevance of KC in the whole model needs to be evaluated.

      Our research primarily focuses on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression[1]. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We have expanded upon this discussion in the revised manuscript.

      Fig 4i - guess here the conclusion would be that neutrophils are important for the pathogenesis in the IMQ model, which is true. This experiment does not support that this is done by pyroptosis.

      To address the question, we analyzed the publicly available single-cell transcriptomic data (GSE165021) and found that, compared to the control group, neutrophils infiltrating in IMQ-induced psoriasis-like tissue display a higher expression of pyroptosis-related genes (see new Figure 3e in the revised manuscript). These results strengthen our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments:

      • Figure 1: Micro abscesses would already be dead, which would likely reflect as non-specific staining. Authors should consider double staining (e.g., GSDMD+Ly6G).

      We thank the reviewer for the useful suggestion. We have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). This addition further substantiates the expression of GSDMD in neutrophils within psoriasis tissue.

      • Figures 1 b, c, and d do not have the n number for representative experiments and images.

      We apologize for our oversight. We have added the relevant information in the revised manuscript and have reviewed and corrected the entire text.

      • What is the difference between psoriasis patients in Figure 1 versus Figure 3 as the staining patterns are different? It is difficult to interpret from Figure 1 that expression is limited to neutrophils. Authors should consider double staining (e.g., GSDMD+Ly6G). How many samples were stained to draw this conclusion?

      We thank the reviewer for the suggestion. In Figure 1b, we observed that GSDMD expression is higher in the psoriasis patient tissues compared to control samples. We have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). For each staining group, we examined samples from 3-5 patients to draw the conclusion.

      • Figure 2: GSDMD deficiency mitigates psoriasis-like inflammation in mice has been shown before (PMID#37673869). The paper showed that the GSDMD was mainly expressed in keratinocytes. What is the view of the authors on it and how does this data correlate with the data presented in this manuscript by the authors?

      Consistent with previous studies[1], we observed increased expression of pyroptosis-related proteins in psoriatic lesions. However, our research focused specifically on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We have expanded upon this discussion in the revised manuscript.

      • Figure 3d: It is unclear if the IF shows an epidermal or dermal area. As shown by authors in other figures (human psoriatic skin), do authors observe more GSDMD in the micro abscess, which is localized in the epidermis? The authors should also show the staining of GSDM/Ly6G in the whole skin sample.

      The region we presented for immunofluorescence staining corresponds to the dermis of the mice, as we did not observe typical neutrophil micro abscesses similar to those in human psoriasis in the epidermis of IMQ-induced classical psoriasis vulgaris (PV) model. Therefore, we have only shown the staining in the dermal area.

      • Figure 3e: PI staining also represents necrotic cells and TUNEL staining would not represent just apoptotic cells. It is unclear how the authors conclude an ongoing pyroptosis in neutrophils. A robust dataset is needed to provide evidence supporting neutrophil pyroptosis in the IMQ-challenged mice.

      We thank the reviewer for the valuable suggestion. GSDMD is the effector protein of pyroptosis. To further confirm that cells are undergoing pyroptosis, it is necessary to morphologically stain the GSDMD N-terminal protein. Although there is currently no GSDMD N-terminal fluorescent antibody available, we detected the cleaved N-terminus of GSDMD by WB in mouse psoriasis-like skin tissue, and its increased expression suggested increased cell pyroptosis (see new Figure 1d in the revised manuscript). Moreover, we analyzed the publicly available single-cell transcriptomic data (GSE165021) and found that, compared to the control group, neutrophils infiltrating in IMQ-induced psoriasis-like tissue display a higher expression of pyroptosis-related genes (see new Figure 3e in the revised manuscript). These results strengthen our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      • Figure 4: The authors did not clarify the reason for choosing D4 over the usual D7 for the imiquimod experiment. S100A8-Cre is also reported in monocytes and granulocytes/monocyte progenitors. And, the authors also show the expression in macrophages and neutrophils, but in the text, only neutrophils are mentioned. The authors should state the results in the text as well to avoid misrepresentation of the data.

      We thank the reviewer for the useful suggestion. We have repeated many times of experiments in our previous studies and observed that the IMQ-induced mouse psoriasis model showed the obvious signs of self-resolution after Day 4 even with continuing topical IMQ application, thus we chose 4 days over 7 days for the imiquimod experiment, which are consistent with many other studies[3, 4].

      Many studies use S100A8-Cre mice for neutrophil-specific gene knockout[5, 6]. Moreover, we used Ly6G antibody to eliminate neutrophils in GSDMD-cKO mice and control mice. It was found that the difference in lesions between the two groups was abolished after neutrophil depletion, indicating that neutrophil pyroptosis plays an important role in the pathogenesis of imiquimod-induced psoriasis-like lesions in mice. As the database analysis results showed that macrophages have slight expression of S100a8, according to the suggestion of the reviewer, we have added a more precise description in the revised manuscript.

      • Figure S2a: Ly6G antibody reduced the ly6G positive, but also negative cells compared to PBS. If this is correct, what is the explanation, and how this observation has been considered for concluding results?

      Neutrophils play an important role in regulating inflammatory responses, and their deletion can reduce the overall inflammatory level in the body, which also results in a decrease in other non-neutrophil cells. However, this change does not affect our conclusions. Our results show that after the deletion of neutrophils, there is no difference in the pathological manifestations between the cKO group and the control group. This further that GSDMD in neutrophil plays an important role in the pathogenesis of miquimod-induced psoriasis-like lesions in mice.

      • The conclusion in Figure 4i is incorrect as Ly6G administration had an effect on the wt, so it shows neutrophils play a role, but not neutrophil pyroptosis.

      - 321 "It was found that the difference in lesions between the

      - 321 two groups was abolished after neutrophil depletion (Fig4i, S2a), indicating that

      - 322 neutrophil pyroptosis plays an important role in the pathogenesis of

      - 323 imiquimod-induced psoriasis-like lesions in mice"

      Our results show that after the deletion of neutrophils, there is no difference in the pathological manifestations between the cKO group and the control group. This further indicates that the lower disease scores observed in cKO mice, in the absence of neutrophil deletion, depend on the presence of neutrophils. In the revised manuscript, we have changed the statement to “It was found that the difference in lesions between the two groups was abolished after neutrophil depletion (Fig4i, S2a), indicating that GSDMD in neutrophil plays an important role in the pathogenesis of miquimod-induced psoriasis-like lesions in mice”

      • The effect of LyG Ab: reduced PASI in the wt, but the effect on the ko remains the same. What are the other molecular changes observed? What was the level of neutrophils in the wt and the S1A008Cre GsdmDfl/fl mice under steady state and how are they change upon imiquimod challenge? A complete profiling of the immune cells is needed for all the experiments.

      As demonstrated by the results, the deletion of neutrophils did not significantly alter the pathological phenotype of cKO mice. We believe that this outcome precisely highlights the crucial role of GSDMD in regulating neutrophil inflammatory responses.

      • Figure S2b: The authors conclude that Il-1b in the imiquimod skin is mainly expressed by neutrophils, but the analysis presented in the figure does not support this conclusion. Both neutrophils and macrophages are majorly positive for I1-b, with some expression on Langerhans and fibroblasts. No n numbers are provided for the experiment

      As we discussed in the manuscript, we speculate that neutrophil pyroptosis may release cytokines, which in turn activate other cells to secrete cytokines, forming a complex inflammatory network in psoriasis. This may suggest that neutrophil pyroptosis may be involved in the pathogenesis of psoriasis by affecting the secretion of cytokines such as IL-1B and IL-6 by neutrophils, thereby affecting the function of other immune cells such as T cells and macrophages.

      We have added the n number in the revised manuscript.

      • For clarity and transparency, a list of antibodies with the associate clone and catalogue number should be provided or integrated into the method text.

      We thank the reviewer for the useful suggestion. We have added the associate clone and catalogue number of antibodies used in the method text of revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Fig 3b: psoriasis and pustular psoriasis have a different pathophysiology (autoimmune vs. autoinflammatory). Neutrophils are centrally important for GPP for the cleavage of IL-36. Guess as not further referred to pustular psoriasis in the paper, that comparison is rather deviating from the story.

      In Figure 3b, we stained for GSDMD and CD66b in both plaque psoriasis (PV) and generalized pustular psoriasis (GPP), not to compare the expression differences between the two types of psoriasis, but rather to demonstrate that significant GSDMD expression is present in neutrophils in different types of psoriasis. Unfortunately, due to the lack of a well-established animal model for GPP, we were only able to conduct studies using the established PV animal model. We acknowledge this limitation in our research. In our revised manuscript, we have added the following explanation in the discussion section: “Although we observed significantly increased GSDMD in neutrophils in pustular psoriasis, we were constrained to studying the established PV animal model due to the current absence of a mature GPP animal model. This represents a limitation of our study.”

      In summary, we appreciate the Reviewer’s comments and suggestions. We feel that the inclusion of new data addresses the concerns in a comprehensive manner and adds further support to our original conclusions. We hope you will now consider the revised manuscript worthy of publication in eLife.

      References:

      (1) Lian, N., et al., Gasdermin D-mediated keratinocyte pyroptosis as a key step in psoriasis pathogenesis. Cell Death & Disease, 2023. 14(9): p. 595.

      (2) Han, J., et al., GSDMD (gasdermin D) mediates pathological cardiac hypertrophy and generates a feed-forward amplification cascade via mitochondria-STING (stimulator of interferon genes) axis. Hypertension, 2022. 79(11): p. 2505-2518.

      (3) Lin, H., et al., Forsythoside A alleviates imiquimod-induced psoriasis-like dermatitis in mice by regulating Th17 cells and IL-17a expression. Journal of Personalized Medicine, 2022. 12(1): p. 62.

      (4) Emami, Z., et al., Evaluation of Kynu, Defb2, Camp, and Penk Expression Levels as Psoriasis Marker in the Imiquimod‐Induced Psoriasis Model. Mediators of Inflammation, 2024. 2024(1): p. 5821996.

      (5) Stackowicz, J., et al., Neutrophil-specific gain-of-function mutations in Nlrp3 promote development of cryopyrin-associated periodic syndrome. Journal of Experimental Medicine, 2021. 218(10): p. e20201466.

      (6) Abram, C.L., et al., Distinct roles for neutrophils and dendritic cells in inflammation and autoimmunity in motheaten mice. Immunity, 2013. 38(3): p. 489-501.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. However, due to a similar lengthy sequence between conjugation partners, the method described in this paper does not provide clear benefits over the existing SpyTag-SpyCatcher conjugation system.  Additionally, specific disadvantages of the method described are not thoroughly investigated, such as difficulty in purifying and separating the desired product from the multiple proteins used. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

      Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

      I would like to sincerely thank both reviewers for their insightful and constructive feedback on the manuscript. I have addressed reviewer #1’s comments below:

      (1) The benefits over the SpyTag-SpyCatcher system. Here, the conjugation partners are fused via the 12.3 kDa SpyCatcher protein, which is considerably larger than the Connectase fusion sequence (20 aa). This is briefly mentioned in the introduction (p. 1 ln 24-25). In a related technology, the SpyTag-SpyCatcher system was split into three components, SpyLigase, SpyTag and KTag  (Fierer et al., PNAS 2014). The resulting method introduces a sequence between the fusion partners (SpyTag (13aa) + KTag (10aa)), which is similar in length to the Connectase fusion sequence. I mention this method in the discussion (p. 8, ln 296 - 297), but preferred not to comment on its efficiency. It appears to require more enzyme and longer incubation times, while yielding less fusion product (Fierer et al., Figure 2).

      (2) Purification of the fusion product. The method is actually advantageous in this respect, as described in the discussion (p. 8, ln 257-263). I plan to add a figure showing an example in the revised article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies, similar to mammals. The authors link the decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed at increasing food search. 

      Furthermore, there is supporting evidence in the paper that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity. However, although the electrophysiological data underlying the dynamics of IPCs in vivo is compelling, the link between IPCs and other potential elements of the circuitry (e.g. octopaminergic neurons) regulating locomotive behaviors is not clear and would benefit from more rigorous approaches. 

      This paper is of interest to cell biologists and electrophysiologists, and in particular to scientists aiming to understand circuit dynamics pertaining to internal state-linked behaviors competing with the feeding state, shown here to be primarily controlled by the IPCs. 

      Strengths: 

      (1) By using whole-cell patch clamp recording, the authors convincingly showed the activity pattern of IPCs and neighboring DH44 neurons under different feeding states. 

      (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of glucose contrary to the IPCs. 

      (3) The paper provides useful data on the firing pattern of 2 key cell populations regulating foodrelated brain function and behavior, IPCs and Dh44 neurons, results which are useful to understand their in vivo function. 

      Weaknesses: 

      (1) The term nutritional state generally refers to the nutrients which are beneficial to the animal. In Figure 1, the authors showed that IPCs respond to glucose but not proteins. To validate the term nutritional state the authors could test the effect of a non-nutritive sugar (e.g. D-arabinose or L-Glucose) on the post-ingestive physiological responses of the IPCs.

      We thank the referee for this insightful comment. Following their suggestion, we included two new experimental data sets, which we added to Figure 1: We show that IPCs do not respond to the non-nutritive sugar D-arabinose (Figure 1H). In order to further expand this data set and our conclusions, we additionally show that IPCs do respond to fructose – a second nutritive sugar in addition to glucose (Figure 1H). Together, these data sets permit the conclusion that IPCs are sensitive to the ingestion of nutritive sugars, and do not respond to ingestion of nonnutritive sugars or high protein diets. Thus, we validate the term nutritional state.

      (2) It is difficult to grasp the main message from the figures in the result section as some figures have several results subsections referring to different points the authors want to make. The key results of a figure will be easier to understand if they are summarized in one section of the results. Alternatively, a figure can be split into 2 figures if there are several key messages in those figures, e.g. Figures 2 and 3.  

      We appreciate this suggestion and have made several changes to our manuscript to add more clarity. Among other things, we have changed the order of data presentation in Figure 2, as suggested by the referee below, where we now start with the IPC activation data rather than the OAN activation. We also swapped the order of data presentation and split Figure S1 into Figures S1 & S2. Moreover, we re-arranged the panel order in supplementary figure S4. This significantly improved the flow of the results section. Since the figures the referee refers to contain comparative data, for example between diets (Figure 1) or neuron types (Figure 2), we prefer to keep these data sets together. However, we have carefully revised the results section to more clearly relate our statements to individual figure panels.

      (3) The prime investigation of the paper is about the physiological response and locomotive behavioral readout linked to IPCs. The authors do not show a link between OANs and IPCs in terms of functional or behavioral readouts. In Figure 2 the authors first start with stating a link between OAN neurons and locomotion changes resulting from internal feeding states. The flow of the paper would be better if the authors focused on the effect of optogenetic activation of IPCs under different feeding states and their impact on fly locomotion. If the experiments done on optogenetic activation of OANs were to validate the experimental approach the data on OAN neurons is better suited for the supplement without the need of a subsection in the result section on the OANs.  

      We agree with the reviewer’s suggestion and switched the order of the figure panels and text to aid the flow of the manuscript. We now show and discuss the IPC activation data first (Figure 2C-H) and OAN activation afterwards (Figure 2I-K). We did keep the OAN data in the main document, though, since that facilitates comparisons between the small effects of IPC activation and the large, well-established effects of OAN activation.

      (4) Figure 2F shows that optogenetic activation of IPCs in fed flies does not influence their locomotor output. In the text, the conclusion linked to Figure 2F-H states that IPC activation reduces starvation-induced hyperactivity which is a statement more suited to Figure 2I-K. 

      We edited the text accordingly.

      (5) The authors show activation of Dh44 neurons leads to hyperpolarisation of the IPCs. What is the functional link between non-PI Dh44 neurons and the IPCs? Do IPCs express DH44R or is DH44 required for this effect on IPCs? Investigating a potential synaptic or peptidergic link between DH44 neurons and IPCs and its effect on behavior would benefit the paper, as it is so far not well connected. 

      Although we have not performed any experiments dedicated to investigating the functional link between DH44Ns outside the PI and the IPCs in this study, there are two lines of evidence supporting that this connection is relatively direct. First, IPCs do express DH44R1 & R2, as we show in a parallel study in eLife (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). Second, we performed functional connectivity experiments using a Leucokinin (LK) driver line in that paper. This driver line labels two pairs of non-PI DH44Ns in the VNC, which are DH44 and LK positive (Zandawala et al 2018). Activating that line leads to inhibition of IPCs, similar to the effect we observed here for DH44N activation. These two lines of evidence suggest that there could be a direct peptidergic connection between DH44+ neurons and IPCs. We have added a paragraph mentioning these experiments to our discussion:

      ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44. A strong candidate for the inhibition are LK and DH44-positive neurons, which are labelled by the broad line(76). In a parallel study, we showed that LK-expressing neurons strongly inhibit IPCs(30), similar to the broad DH44 line used here. Furthermore, evidence from single-nucleus transcriptomic analysis shows that IPCs express DH44-R1 and DH44-R2 receptors(30). Therefore, it is possible that DH44Ns communicate with IPCs through a direct peptidergic connection. Notably, the inhibitory effect of non-PI DH44Ns on IPCs was very strong and fast, suggesting that a connection via classical synapses is more likely. Regardless, our results show that the glucose sensing DH44<sup>PI</sup>Ns and IPCs act independently of each other.’

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of *Drosophila melanogaster*. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made. 

      Strengths: 

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools. 

      It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to. 

      Weaknesses: 

      I find the inability of SD to rescue the IPC starvation effect in Figure 1G&H surprising, given that the fully fed flies were raised and kept on that exact diet. Did the authors try to refeed flies with SD for longer than 24 hours? I understand that at some point the age effect would also kick in and counteract potential IPC activity rescue. I think the manuscript would benefit if the authors could indicate the exact age of the SD refed flies and expand a bit on the discussion of that point.  

      We have expanded the first paragraph of our discussion to tackle these questions, in particular the potential effect of aging, as suggested by the referee. We now also indicate the exact age of the flies. Moreover, we have conducted additional experiments in which we added either glucose or arabinose to our standard diet (Figure 1H). As we would have expected based on our hypothesis that the glucose concentration in our standard diet was too low to cause an increase in IPC activity after starvation, we find that feeding standard diet plus glucose increases IPC activity to the same level as glucose only, and that adding arabinose to the standard diet does not lead to increased IPC activity after starvation (Figure 1H).

      The incretin-like effect is exciting and it will be interesting in the future to find out what might be the signal mediating this effect. It is interesting that IPCs in explants seem to be responsive to glucose. I think it would help if the authors could briefly discuss possible sources for the different findings between these in fact very different preparations. Could the the absence of the inhibitory DH44 feedback in the *ex-vivo* recordings for example play a role? 

      We thank the referee for this interesting point and expanded our discussion accordingly. We included that, in particular in brain explants without a VNC, the inhibitory connection we describe might be absent, as the referee suggested: ‘Previous ex vivo studies suggested that IPCs, like pancreatic beta cells, sense glucose cell-autonomously(23,24). Consistent with this, we observed an increase in IPC activity after the ingestion of glucose (Figure 2B). However, IPC activity did not increase during the perfusion of glucose directly over the brain. Importantly, the fly preparations were kept alive for several hours allowing the glucose-rich saline to enter circulation and reach all body parts. Several factors may explain the difference between ex vivo and in vivo preparations. First, in ex vivo studies, certain regulatory feedback mechanisms present in vivo could be absent. For example, the strong inhibitory input IPCs receive from DH44Ns we found would likely be absent in brain explants without a VNC. A lack of inhibitory feedback might allow for more direct glucose sensing by IPCs ex vivo, whereas in vivo, the IPC response could be suppressed by more complex systemic feedback. Second, we attempted to use the intracellular saline formulation employed in a previous ex vivo study44. However, we observed that IPCs depolarized quickly using this saline, leading to unstable recordings that did not meet our quality standards for in vivo experiments. Another possible explanation for the lack of an effect of glucose might have been that the dominant circulating sugar in flies is trehalose(70,71) which is derived from glucose. When we extended our experiments, we found that trehalose perfusion did not affect IPC activity either, strengthening the idea that IPCs do not directly sense changes in hemolymph sugar levels. Therefore, our findings suggest that, similar to mammals, IPC activity and hence, insulin release, is not simply modulated by hemolymph sugar concentration in Drosophila.’ 

      The incretin-like effect the authors observed seems to start only after 5h which seems longer than in mammals where, as far as I know, insulin peaks around 1h. Do the authors have ideas on how this timescale relates to ingestion and glucose dynamics in flies? 

      We have now included the following section in the discussion to explicitly address the question of different activity dynamics in flies and mammals, but also the limitations of our electrophysiological approach in this regard: ‘We observed that IPC activity increased over a timescale of hours, which is longer compared to the fast insulin response in mammals, where insulin typically peaks within an hour of feeding(97). In flies, insulin levels rise within minutes of refeeding, followed by a drop after 30 min(20). Our experimental techniques limit our ability to capture these fast initial dynamics, since the preparation for intracellular recordings requires tens of minutes, so that we typically recorded IPC activity at least 20 min after the last food ingestion. Notably, studies in fasted mammals have shown that insulin peaks within minutes of refeeding, followed by a rapid decline, with levels stabilizing as feeding continues(98,99). We speculate a similar dynamic could be present in flies, but with our approach, we capture the steady-state reached tens of minutes after food ingestion rather than a potential initial peak.’ 

      The authors mention "a decrease in the FV of IPC-activated starved flies even before the first optogenetic stimulation (Figure 2I),". Could this be addressed by running an experiment in darkness, only using the IR illumination of their behavioral assay? 

      We thank the referee for pointing out this unexpected result. We discuss this in more detail in the new version of our manuscript and expand on the reasons for not performing these optogenetic activation experiments in the dark: First, the red LED required to activate CsChrimson triggers strong startle responses in dark-adapted flies, which mask other behavioral effects, in particular subtle ones such as those observed for IPCs. The startle response is much reduced when performing experiments under low background light conditions. Second, flies, at least in our hands, do not exhibit robust foraging behavior or starvation-induced hyperactivity in the dark, which is critical for our behavioral experiments. However, we also explain in our discussion that we believe the effect of background illumination is relatively small, since flies expressing CsChrimson in OANs or DH44Ns show comparable activity levels to controls. Hence, a part of this effect is likely attributable to leak currents induced by CsChrimson expression. We would like to point out though that we are careful in our description of the IPC effect on behavior, and focus on the fact that it is considerably smaller than the effects of other modulatory neurons (DH44Ns and OANs).

      The authors show an inhibitory effect of DH44 neuron activation on IPC activity. They further demonstrate that DH44PI neurons are not the ones driving this and thus conclude that "...IPCs are inhibited by DH44Ns outside the PI.". As the authors mentioned the broad expression of the DH44-Gal4 line, can they be sure that the cells labeled outside the PI are actually DH44+? If so they should state this more clearly, if not they should adapt the discussion accordingly.   

      We have substantially added to our discussion of this point, according to the referee’s great suggestion. In short, the broad line includes neurons that are DH44 positive and neurons that are not: ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44.’

      Reviewer #3 (Public Review): 

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo is limited. 

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and optogenetic manipulation. 

      The data indicate that IPC activity is increased with a slow time course after feeding a high-glucose diet. By contrast, IPC activity is not directly affected by increasing blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding. 

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect modulation. Together, these data indicate a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network. 

      Strengths: 

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are compelling. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose-sensitive modulatory neurons (Dh44) is strong. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, and behavior. 

      Weaknesses: 

      Neither the mechanisms underlying the incretin effect, nor the network to orchestrate physiological, metabolic, and behavioral responses to nutritional state have been fully uncovered. Without additional controls, some of the conclusions would require significant downtoning. Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose. The claim that IPC activity is controlled by the nutritional state would require that starvation-induced IPC silencing in young animals can be recovered by feeding a normal diet. At current firing in starvation, silenced IPCs can only be induced by feeding a high-glucose diet that lacks other important ingredients and reduces vitality. Therefore, feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges. The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity. The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect. The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated. 

      We thank the referee for the thoughtful and constructive criticism of our experiments and conclusions. Below, we lay out how we tackled the individual points raised by the referee.

      (1) ‘Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose.’  

      To address this point, we conducted experiments in which we perfused trehalose (Figure 3B), the main circulating hemolymph sugar in Drosophila and other insects. Our results clearly show that trehalose does not affect IPC activity upon perfusion, confirming our statements that IPCs do not sense key blood sugars directly.

      (2) ‘Feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges’. 

      We agree with the referee that this point was not completely fleshed out in our first submission. We have now performed additional experiments in which we added glucose (and fructose) to our standard diet (Figure 1H). Flies feeding on this diet received all necessary nutrients but still experienced high concentrations of sugars. The effects of high glucose in a standard diet background were indistinguishable from those of high glucose in agarose, confirming that the IPCs respond to sugar rather than stress. Another important observation in this context is that IPCs in flies kept on a high protein diet exhibited much lower spike rates than flies exhibiting the high glucose diet, even though they had a much shorter lifespan and therefore, presumably, experienced much higher stress levels (Figure 1H, Figure S1). These observations underline that stress is certainly not the primary factor here.

      (3) ‘The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity.’

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvation-induced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      (4) ‘The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect.’

      We followed the referee’s excellent suggestion and determined the time course of the starvation effect in three timesteps, similar to the experiments we did for refeeding (Figure 1G). In addition, we now also quantify the number of active IPCs (i.e., IPCs that fired at least one action potential during our five-minute analysis window), which further illustrates the dynamics of the starvation and refeeding effects. We find that the starvation effect is graded, and that IPC activity decreases with increasing starvation duration.

      (5) ‘The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated.’

      To address the referee’s comment, we have added 14 new IPC recordings from flies in the 6–26-day range, such that we now have recordings from 9-14 IPCs for each age range (Figure S2B). They confirmed our previous analysis and strengthened the finding that IPC activity dramatically decreases after 8 days (on our standard diet). The total number of IPCs in this supplementary dataset was thus increased from 34 to 48.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) Do IPCs respond to glucose specifically after ingestion or generally to any other nutritive sugars? To tackle this question the IPC responses in starved flies can be recorded after refeeding flies with other nutritive sugars (fructose, sucrose). 

      To address this important question, we have performed additional experiments in which we refed starved flies with fructose, as a nutritive sugar, and arabinose, as a non-nutritive sugar. As expected, IPCs responded to fructose but not arabinose and hence nutritive sugars in general. We describe and discuss these key results in the new version of our manuscript.

      (2) In Figure 2, the x and y axes are not annotated on all subfigures, which might help improve clarity. 

      We have annotated the subfigures as requested.

      (3) In the discussion on page 9 ("...we observed an increase in IPC activity after the ingestion of glucose (Figure 2B)."), the authors refer to Figure 2B instead of 3C.

      We have fixed this oversight.

      Reviewer #2 (Recommendations For The Authors): 

      Introduction 

      I think it could be helpful for the reader if you would briefly state the number of IPCs and whether you are targeting all of them with Dilp2-Gal4. 

      We included the numbers according to the suggestion. 14 IPCs are labeled in the driver line, and this is the number of IPCs commonly assumed to be present in the PI.

      Figures 

      In some Figures (for example 1D & E) the authors state the number of IPCs recorded (N) but not the number of animals used (n). This should be stated as the data from within an animal are dependent and might give insights about IPC heterogeneity. 

      We have compiled tables for the supplementary material (Tables S5 & S6) in which we state the number of IPCs and DH44<sup>PI</sup>Ns recorded and the number of different flies for each figure panel. We have recorded an average of 1.4 IPCs per fly (217 IPCs from 160 flies). We therefore expect the bias introduced by individual flies to be rather small. However, in our parallel study, we specifically investigate the heterogeneity of IPCs by maximizing the number of IPCs recorded per fly (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). In the case of DH44PINs, we recorded 24 neurons in 21 flies – 1.1 neurons per fly.

      - Figure 3D: There is some white visible among the cell bodies in the overlay. I assume this comes from projecting across layers rather than indicating DH44 - IPC overlap? It would help to explicitly state that. 

      We have added a statement to the results section, in which we explain that most of the white is due to overlap in the z-projection rather than overlap in the driver lines. However, there are few cases (typically one to two cells per brain), in which neurons labeled by the DH44 line also stain positive for Dilp2, indicating they express both neuropeptides. We have added this information to the manuscript:  

      Results: ‘DH44<sup>PI</sup>Ns are anatomically similar to IPCs, and their cell bodies are located directly adjacent to those of IPCs in the PI, making them an ideal positive control for our experiments (Figure 3D). A small subset of DH44<sup>PI</sup>Ns also expresses Dilp2(75), and our immunostainings confirmed colocalization of Dilp2 and DH44 in a single neuron (Figure 3D, white arrow).’

      In figure caption: ‘UAS-myr-GFP was expressed under a DH44-GAL4 driver to label DH44 neurons. GFP was enhanced with anti-GFP (green), brain neuropils were stained with anti-nc82 (cyan), and IPCs were labelled using a Dilp2 antibody (magenta). White arrow indicates Dilp2 and DH44-GAL4 positive neuron. The other white regions in the image result from an overlap in z-projections between the two channels, rather than from antibody colocalization.’

      - Figure 4I: One might get the impression that the fast onset peak of activity precedes the stimulation onset, using a thinner line width might help avoid that. 

      This effect is due to a combination of using relatively heavy lines for clear visibility of the data and a gentle smoothing step (a 2s median filter, which corresponds to less than 1% of the 300s stimulation window) in our analysis of the behavioral data. However, inspection of the raw data clearly shows increases in velocity after the onset of the optogenetic activation. We clarified this in the figure caption: ‘Average FV across all DH44N activation trials based on two independent replications of the experiment in I. Note that the peak in average FV lies within the first frame of the stimulation window.’

      - S3 panel letters do not match references in the text.

      We fixed this oversight.

      Formatting 

      - Page 10: The paragraphs on the bottom of the page got switched around.

      This has been fixed.

      - Page 14: The first paragraph after the header "Free-walking assay" seems to be coming from elsewhere. 

      We apologize for this slightly embarrassing mistake. We used our related bioRxiv preprint (Held et al.) as a template for formatting this paper, and accidentally left this part of the methods section in the manuscript. We have fixed this error in our resubmission.

      Reviewer #3 (Recommendations For The Authors): 

      Major suggestions: 

      (1) The data show convincingly that IPC activity is decreased by starvation during the first week of adult life (Figures 1C and D). However, the conclusion that IPC activity is controlled by the nutritional state requires additional care. First, refeeding starved adult animals with a normal diet does not bring back normal IPC firing rates (Figure 1H). Therefore, IPC activity does not strictly follow changes in nutritional state, but IPCs are silenced by starvation. Second, from the second week of adult life on, IPCs are silent anyway, and thus unlikely responsive to changes in the nutritional state anymore (which might be different on a different standard diet?) The only effect of feeding on IPC activity is observed upon feeding starved, young animals with high glucose for 12-24 hrs (Figure 1G). However, it is not clear whether increased IPC firing is caused by the effects of high glucose on the nutritional state in a normal range, or because of diet-induced stress (the diet also severely shortens lifespan, Figure 1S). Does high glucose also increase IPC firing rate in young, fed animals? These would have strongly increased glucose concentrations but not suffer the stress of not getting any other nutrients. Such experiments would be required to make the statement that glucose feeding increases IPC firing rate. 

      We have performed several experiments to address this criticism. First, we performed a time course analysis of the starvation effect. We show that the IPC activity reduction is graded, and that IPC activity declines already after two hours of starvation, a timepoint at which stress levels should still be relatively small (Figure 1G). Second, we refed flies with high glucose concentrations added to the standard diet (Figure 1H). This minimized any potential stress responses due to a lack in nutrients. Third, we now show that IPCs specifically respond to nutritive (glucose and fructose), but not to non-nutritive sugars (arabinose, Figure 1H). We believe that these data sets, in addition to the graded refeeding effect, make a strong case for the nutritional state dependent modulation of IPCs. 

      (2) The testing of locomotor activity is well done, nicely recapitulates starvation-induced increases in locomotion, and adds interesting novel findings on refeeding with high glucose versus high protein diet. However, the statement that locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity does not reflect the data presented. Refeeding starved flies with a standard diet had no effect on IPC activity (Figure 1H) but a strong effect on locomotor activity of starved flies (a strong reduction, even stronger than high glucose diet, Figure 2B). 

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvationinduced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      Related to points 1 and 2, a key statement that the results establish that IPC activity is controlled by the nutritional state requires care. What the data convincingly show is that IPC activity is near zero upon starvation. 

      As described above, we have added several extensive data sets (fructose feeding, arabinose feeding, trehalose perfusion, starvation time course) to show that we indeed observe a nutritional state dependent modulation of IPCs and describe these new results in the results and discussion.

      (3) The time course of nutritional state-dependent changes of IPC activity is claimed to be slow, several hours to days. Unless I have missed a figure, the underlying data are not presented (only for high glucose diet). It would be great if this could also be shown for a standard diet with higher glucose concentrations than the one used so that it rescues starvation-induced IPC silencing without shortening lifespan (if this is feasible?). The data showing starvation-induced IPC silencing are convincing, but, unless I have missed it, the time course has not been determined. It would be very nice to actually show this. Have different starvation times been tested in relation to IPC firing rate, and if yes, with what time resolution? Does IPC activity change already after 0.5 or 1 or a few hours of starvation? If starvation can silence IPCs faster than assumed, the nearzero IPC activity in animals older than a week could very well be caused by longer time intervals between meals. 

      We have performed experiments to address both important points raised by the referee here. 1) We have added high glucose concentrations to our standard diet, and show that it has the same effect – a significant increase in IPC activity – as the high glucose diet (Figure 1H). 2) We have analyzed the time course of IPC activity reduction in response to starvation (Figure 1G). Indeed, we find that a few hours of starvation start reducing IPC activity. We discuss the possibility that reduced IPC activity in older flies could be due to reduced food intake: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days86. Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      (4) The data on the proposed incretin effect are of high importance in potentially highlighting a highly conserved link between glucose ingestion and insulin release. An important control would be to test different sugars, such as trehalose, an important blood sugar of flies. If glucose is converted into trehalose and this is what IPCs sense, then perfusion of glucose has no effect. The fact fantastic experiments show that the DH44 neurons are sensitive to glucose perfusion does rule out that IPCs sense a different sugar. This would be very different from the incretin effect that requires additional hormones. In addition, as mentioned above, controls are required to show that high glucose affects IPCs as a nutrient and not as a stressor (see point 1), for example refeeding with a standard diet that contains a higher glucose concentration but does not reduce lifespan. Another great control to solidify the exciting claim on the incretin effect would be to knock out candidate Drosophila incretin hormones and test whether a high glucose diet stops increasing the IPC firing rate (although simpler controls might also do the job). 

      We have performed the two key experiments suggested by the referee. 1) We perfused trehalose as the primary blood sugar of flies and showed that IPCs do not respond to trehalose perfusion (Figure 3B & C). This further strengthens the finding that IPC activity in flies shows an incretin-like effect. 2) We have added high concentrations of glucose to our standard diet to provide flies with a full diet that contains high glucose concentrations. IPC activity in these flies was indistinguishable from the activity in flies which consumed pure glucose diets. In contrast, IPC activity in flies kept on a high protein diet, which dramatically reduced lifespan, was very low. These results clearly show that higher IPC activity is not due to increased stress levels, but a function of nutritive sugar ingestion. We further validated this hypothesis by refeeding flies with fructose as a nutritive sugar, which increased IPC activity, and arabinose as a non-nutritive sugar, which did not affect IPC activity (Figure 1H).

      Another point that might be relevant to this discussion is that IPC activity is almost entirely shut down during flight in Drosophila (which we showed in Liessem et al. 2023, Current Biology 33 (3), 449-463. e5). Several ‘stress hormones’ are released during flight, including octopamine. The fact that IPC activity is low in flying flies, starved flies, and flies kept on a pure protein diet (which all experience high stress levels), to us, very clearly suggests that stress is not the predominant factor here. We would also like to point out that, while the lifespan was reduced in flies kept on pure glucose diets, survival rates were at 100% until day 14, and we carried out our experiments on day 2 after starvation. Hence, these flies might not (yet) experience particularly high stress levels.

      (5) The discussion relates the absence of IPC firing in animals older than 1 week to aging. However, given that the flies fed on a normal diet show the typical lifespan for Drosophila, a 10-dayold fly is still in its youth. Maybe flies at 10 days eat simply less and thus IPC spiking goes down as in starved flies, especially because the standard diet used contains low glucose. Do IPCs also become silent after a week if the animals are fed with a standard diet that contains a higher glucose concentration? Without additional controls, this part of the discussion is pretty speculative and should be revised. 

      We agree with the reviewer, that it is not clear whether reduced IPC activity is a direct result of physiological changes that occur with aging, or an indirect effect of reduced food intake, which occur during aging. In both cases, in our view, it would be an age-related effect. Since this is a minor point of our manuscript, we decided not to perform additional experiments, other than significantly increasing the sample size for the aging data set already presented to shore up our findings (Figure S2B). We have, however, revisited the discussion of this point according to the referee’s suggestion: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days(85). Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      Other suggestions: 

      (6) For the mixed effects of octopamine and tyramine on larval locomotion that are referred to, it might be interesting to also look at Schützler et al 2019, PNAS because it shows that starvation activates TBH so that the octopamine to tyramine ratio is increased. 

      We refer to Schützler et al. in the following paragraph of our discussion: ‘This intermittent locomotor arrest has been previously described in adult flies and is thought to be mediated by ventral unpaired median OANs, which have been suggested to suppress long-distance foraging behavior(69). Since these are not the only neurons we activate in the TDC2 line, we speculate that the stopping phenotype could also result from concerted effects of octopamine and tyramine modulating muscle contractions(65-67) and motor neuron excitability(68), as previously described in Drosophila larvae, or from OANs interfering with pattern generating networks in the ventral nerve cord (VNC) during longer activation(69).’  

      (7) The reference list requires care. For example, reference 43 is identical to 67, reference 66 gives no information on incretin-like hormones in Drosophila as stated in the text 

      We carefully double-checked our reference list and corrected the mistakes mentioned.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have reviewed, with interest, the manuscript "Psychological stress disturbs bone metabolism via miR-335-3p/Fos signaling in osteoclast". The described findings are relevant and useful for daily practice in periodontology. The paper is concise, professionally written, and easy to read. In this study, Jiayao et al. revealed the role of miR-335-3p in psychological stress-induced osteoporosis. CUMS mice were constructed to observe the femur phenotype, osteoclasts were identified as the primary research object, and miRNA-seq was used to find the key miRNAs linking the brain and peripheral tissues. This study showed that the expression of miR-335-3p was simultaneously reduced in mice's NAC, serum, and bone under psychological stress. The miR-335-3p/Fos/NFATC1 signaling pathway was validated in osteoclasts to reveal the potential mechanism of enhanced osteoclast activity under psychological stress. From a new perspective of miRNAs, this study indicates a possible cause of disturbed bone metabolism due to psychological stress and may suggest a new approach to treating osteoporosis.

      We thank this reviewer for the instructive suggestions and encouragement.

      Reviewer #2 (Public Review):

      Zhang et al. established chronic unpredictable mild stress (CUMS) mouse model, which displayed osteoporosis phenotype, suggesting a potential correlation between psychological stress and bone metabolism. They found that miRNA candidate miR-335-3p is downregulated in the long bone of CUMS mice through microRNA sequencing and qRT-PCR experiments. They further demonstrated that miR-335-3p attenuates osteoclast activity via inhibiting Fos signaling, which can induce NFATC1 expression and regulate osteoclast activity.

      Strengths:

      The authors established CUMS mouse model and confirmed the osteoporosis phenotype through careful characterization of bone and analysis of osteoclast activity. They performed microRNA sequencing to identify the miRNA candidate regulating the bone loss in the CUMS mouse model. They also validated the expression of miR-335-3p and interfered with the function of miR-335-3p through an in vitro assay. Overall, the findings from this study provide important hints for the correlation between psychological stress and bone metabolism.

      We thank this reviewer for the comprehensive summary and positive comment on our work.

      Weakness:

      The data provided by the authors are preliminary, especially the mechanistic insight, which needs to be enhanced. The authors have shown that miR-335-3p expression was altered in the CUMS mouse model and the change of its expression regulated osteoclast activity. The validation should be conducted in vivo, and the mechanism behind this should be investigated further.

      We thank the reviewer’s important insight on the need for further in vivo validation of the role of miR-335-3p. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist). Then, we injected them into the body through the tail vein for about 2 months and observed the bone phenotype in each group of mice. The results suggested that the decrease of miR-335-3p in vivo could lead to bone loss, which was consistent with our in vitro validation results (Figure 5H-I).

      Reviewing Editor:

      Method

      (1) Bone histomorphometric analysis following ASBMR's guidelines Bone histomorphometric analysis of bone formation and bone resorption: The authors should follow ASBMR's guidelines for bone histomorphometry (PMCID: PMC3672237 and PMID: 3455637) to perform standard analyses of histomorphometry, rather than selected areas. They should also clearly describe a software used and define the areas analyzed.

      We carefully re-analyzed bone histomorphometry according to ASBMR guidelines and combine this with our own understanding. At the same time, we improved the description of micro-CT and histological analysis in the method. If there is still any lack of standardization, we would be grateful for any constructive suggestions to improve this.

      (2) Osteoclast cultures require nuclear staining to demonstrate multinucleated Trap positive cells.

      We used the RAW264.7, a mouse macrophage-like cell line, for in vitro culture and induced its differentiation towards osteoclasts. Successfully induced osteoclasts showed enlarged cytoplasm and multinucleated fusion. Tartrate-resistant acid phosphatase (Trap) is the signature enzyme of osteoclasts. It can bind to the chromogen to exhibit a mauve color, based on the principle of azo-coupled immunohistochemistry. At the same time, small and rounded nuclei fused show a lighter color (author response image 1, yellow arrows). We attempted to stain the nuclei with hematoxylin based on this. However, it was unable to further distinguish the contours of the nuclei clearly due to the similar color to the Trap positive signals. Besides, many other scholars have assessed osteoclast activity in vitro experiments based solely on the results of Trap staining (area and number) (Cheng et al., 2022; Li et al., 2019; Ma et al., 2021; Zhong et al., 2023). Nevertheless, in the immunofluorescence staining of osteoclasts, the nuclei were labeled using a Hochest antibody to reflect the multinucleated fusion of osteoclasts (Figure 5G).  

      (3) Osteoclast pit assays should be carried out to necessarily demonstrate the change of osteoclast resorption ability caused by miR-335-3p.

      We added osteoclast pit assays to validate the role of miR-335-3p on osteoclast resorptive capacity (Figure 5D-E).

      (4) Serum ELISA assay should be done to examine the global change of bone remodeling in the CUMS mice to assess bone formation and bone resorption that will support their claim.

      We performed additional tests on serum concentrations of R-hydroxy glutamic acid protein (BGP), TRAP, Cathepsin K (CTSK), parathyroid hormone (PTH), calcium (CA), phosphate (P) in control and CUMS mice, which could better reflect the global change of bone remodeling in the CUMS mice (Figure 3— figure supplement 1).

      (5) miR-RNA-seq: A labeled volcano plot should be used to replace the present one to show significant changes in differential gene expression.

      We appreciate this great suggestion. We replaced the volcano plot that showed significant changes in differential gene expression (Figure 4B). We also uploaded the raw data to the GEO database (GSE253504), making the results clearer and more accessible.

      Discussion

      The authors should discuss previous works on the influences of hormones from the brain on chronic stress-induced bone loss and an association of these influences with their findings.

      The discussion on the relationship between the bone metabolism regulation of both hormones and miR-335-3p in psychological stress was added in the second and fifth paragraphs of the discussion. To conclude, on the one hand, brain-derived and blood-transported miR-335-3p regulate bone metabolism synergistically. On the other hand, it exerted a more direct influence on bone under psychological stress.

      Language

      The language of the MS should be improved.

      The manuscript has been carefully edited by a professional proofreader.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1F: The exact meaning of the Waveform Graph shown at left needs to be clarified for the not-so-experienced reader.

      We added the more detailed meaning of the Waveform Graph in figure legends (Figure legend 1F).

      (2) Is the concomitant increase in osteogenic and osteoblastic activity in this study consistent with that seen in similar disease studies? This could be added to the discussion.

      In the fifth paragraph of the discussion section, we present the alterations of osteogenic and osteoblastic activity observed in other studies that are similar to ours. We also had a detailed discussion based on these observations.

      (3) Figure 6A: Please highlight the key information to visualize the potential linkage among miR-335-3p, Fos, and osteoclast.

      We highlighted the crucial linkage among miR-335-3p, Fos, and osteoclast with red arrows (Figure 6A)

      4) Figure 6E: The specific area of the selected comparison needs to be clarified. Please add white dotted lines and lettering T (trabecular bone) and GP (growth plate) for the not-so-experienced reader. This will provide some orientation.

      We used white dotted lines as well as letters to label the tissue in immunofluorescence staining images (Figure 6E).

      (5) Line 350: "NAC derived and blood-trans, Ported miR-335-3p". There is a grammatical error. Please conduct general proofreading of the text and writing style.

      Thank you for pointing this out. We have corrected this grammatical error, and we also checked the full text to correct similar errors.

      Reviewer #2 (Recommendations For The Authors):

      (1) miR-335-3p was downregulated in the femur in the CUMS mice. The possible mechanism for this outcome should be further discussed. In Figure 4B, the Volcano plot showed that only a few miRNA were differentially expressed between the control and CUMS mice. How do the authors explain this?

      The chronic unpredictable mild stress (CUMS) model was constructed using normal mice. As the name of the model suggests, the stimulus is mild and does not cause developmental damage or teratogenic effects in mice. Conversely, CUMS has the potential to result in the chronic pathological conditions. Besides, in miRNA sequencing results from other tissues with similar models to ours, the number of differential miRNAs is also around a few dozen (Ma et al., 2019).

      (2) The authors have demonstrated that miR-335-3p inhibits osteoclast differentiation based on an in vitro assay in Figure 5; however, an in vivo experiment is required to provide more solid evidence.

      We strongly agree that in vivo experimental validation would bring more convincing results to this study. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist), which were injected into mice via the tail vein every five days. Samples were collected at one and two months following the injection. We found that sustained two-month injections of antagomir could significantly lead to bone loss in mice (Figure 5H-I), which is consistent with our in vitro validation results.

      However, the Agomir-miR-335-3p group did not exhibit a notable enhancement of bone mass. This may be attributed to the fact that the 11-week-old normal mice selected for this study were in their prime and did not have strong osteoclastic activity in vivo. Therefore, the osteoclastic inhibition of Agomir-335-3p could not be demonstrated.

      In addition, no significant difference was seen one month after the injection. The main reason may be that the time is too short. On the one hand, the drug we injected was RNA preparation. They lacked stability resulting in poor delivery efficiency, which took some time to take effect. On the other hand, bone remodeling is also a time-consuming process.

      (3) FOS and NFATC1 should be expressed in the nuclei of the cells, therefore, the quality of the images needs to be improved.

      NFATC1 is a T-cell-activating nuclear factor that is activated in the nucleus to regulate the transcription of a variety of osteoclast-related genes, including ACP5, MMP9, etc. FOS could bind and interact with NFATC1, resulting in nuclear translocation and transcription activated. This could promote the differentiation and maturation of osteoclasts. They are both synthesized and processed in the cytoplasm and eventually enter the nucleus to perform their functions. Therefore, they are expressed in both the nucleus and the cytoplasm (Deng et al., 2022; Hounoki et al., 2008; Li et al., 2022).

      In Figure 5G, we labeled cell nuclei with HOCHEST antibody with blue fluorescence, and more co-localized signals of nuclei (blue), FOS (red), and NFATC1 (green) were seen in the Inhibitor-miR-335-3p group, whereas the opposite result was observed in the Mimic-miR-335-3p group. These results indicated that inhibited miR-335-3p could promote osteoclast differentiation in vitro.

      (4) The expression of FOS was elevated in CUMS group in Figure 6E; however, its mRNA level was unchanged, as shown in Figure 6 supplement; what is the explanation for this? How do the authors claim FOS is the downstream target if its mRNA expression is not impacted by CUMS?

      The results demonstrated that miR-335-3p targeted binding to the mRNA of Fos did not result in mRNA degradation. Instead, this binding interferes with the protein translation process, which ultimately leads to the reduction of FOS protein.

      (5) What would be the bone phenotype if a FOS inhibitor was injected into the control and CUMS mice? It is important to examine FOS function through an in vivo context.

      The regulatory role of FOS for osteoclasts has been validated in numerous articles, both in vivo and in vitro(Aikawa et al., 2008; Cao et al., 2023; Cheng et al., 2022). For example, Aikawa et al. designed a small-molecule inhibitor of c-Fos/activator protein-1 (AP-1) using three-dimensional (3D) pharmacophore modeling, which helped verify the effect of FOS on osteoclasts in vivo(Aikawa et al., 2008).

      We also strongly agree that in vivo injection of inhibitors of FOS, especially in CUMS mice, could further substantiate the role of miR-335-3p in osteoclasts under psychological stress. However, the study was constrained by the unavailability of commercially viable, efficacious small molecule inhibitors of FOS. In the future, we plan to design more precise therapeutic targets for psychological stress induced osteoporosis based on existing research ideas.

      Reference

      Aikawa, Y., Morimoto, K., Yamamoto, T., Chaki, H., Hashiramoto, A., Narita, H., Hirono, S., & Shiozawa, S. (2008). Treatment of arthritis with a selective inhibitor of c-Fos/activator protein-1. Nature Biotechnology, 26(7), 817-823. https://doi.org/10.1038/nbt1412

      Cao, Z., Niu, X. B., Wang, M. H., Yu, S. W., Wang, M. K., Mu, S. L., Liu, C., & Wang, Y. X. (2023, Nov). Anemoside B4 attenuates RANKL-induced osteoclastogenesis by upregulating Nrf2 and dampens ovariectomy-induced bone loss [Article]. Biomedicine & Pharmacotherapy, 167, 12, Article 115454. https://doi.org/10.1016/j.biopha.2023.115454

      Cheng, X., Yin, C., Deng, Y., & Li, Z. (2022). Exogenous adenosine activates A2A adenosine receptor to inhibit RANKL-induced osteoclastogenesis via AP-1 pathway to facilitate bone repair. Molecular Biology Reports, 49(3), 2003-2014. https://doi.org/10.1007/s11033-021-07017-1

      Deng, W., Ding, Z., Wang, Y., Zou, B., Zheng, J., Tan, Y., Yang, Q., Ke, M., Chen, Y., Wang, S., & Li, X. (2022). Dendrobine attenuates osteoclast differentiation through modulating ROS/NFATc1/ MMP9 pathway and prevents inflammatory bone destruction. Phytomedicine : International Journal of Phytotherapy and Phytopharmacology, 96, 153838. https://doi.org/10.1016/j.phymed.2021.153838

      Hounoki, H., Sugiyama, E., Mohamed, S. G.-K., Shinoda, K., Taki, H., Abdel-Aziz, H. O., Maruyama, M., Kobayashi, M., & Miyahara, T. (2008). Activation of peroxisome proliferator-activated receptor gamma inhibits TNF-alpha-mediated osteoclast differentiation in human peripheral monocytes in part via suppression of monocyte chemoattractant protein-1 expression. Bone, 42(4), 765-774. https://doi.org/10.1016/j.bone.2007.11.016

      Li, Y., Yang, C., Jia, K., Wang, J., Wang, J., Ming, R., Xu, T., Su, X., Jing, Y., Miao, Y., Liu, C., & Lin, N. (2022). Fengshi Qutong capsule ameliorates bone destruction of experimental rheumatoid arthritis by inhibiting osteoclastogenesis. Journal of Ethnopharmacology, 282, 114602. https://doi.org/10.1016/j.jep.2021.114602

      Li, Z., Huang, J., Wang, F., Li, W., Wu, X., Zhao, C., Zhao, J., Wei, H., Wu, Z., Qian, M., Sun, P., He, L., Jin, Y., Tang, J., Qiu, W., Siwko, S., Liu, M., Luo, J., & Xiao, J. (2019). Dual Targeting of Bile Acid Receptor-1 (TGR5) and Farnesoid X Receptor (FXR) Prevents Estrogen-Dependent Bone Loss in Mice. Journal of Bone and Mineral Research : the Official Journal of the American Society For Bone and Mineral Research, 34(4), 765-776. https://doi.org/10.1002/jbmr.3652

      Ma, K., Zhang, H., Wei, G., Dong, Z., Zhao, H., Han, X., Song, X., Zhang, H., Zong, X., Baloch, Z., & Wang, S. (2019). Identification of key genes, pathways, and miRNA/mRNA regulatory networks of CUMS-induced depression in nucleus accumbens by integrated bioinformatics analysis. Neuropsychiatric Disease and Treatment, 15, 685-700. https://doi.org/10.2147/NDT.S200264

      Ma, Q., Liang, M., Wu, Y., Luo, F., Ma, Z., Dong, S., Xu, J., & Dou, C. (2021). Osteoclast-derived apoptotic bodies couple bone resorption and formation in bone remodeling. Bone Research, 9(1), 5. https://doi.org/10.1038/s41413-020-00121-1

      Zhong, L., Lu, J., Fang, J., Yao, L., Yu, W., Gui, T., Duffy, M., Holdreith, N., Bautista, C. A., Huang, X., Bandyopadhyay, S., Tan, K., Chen, C., Choi, Y., Jiang, J. X., Yang, S., Tong, W., Dyment, N., & Qin, L. (2023). Csf1 from marrow adipogenic precursors is required for osteoclast formation and hematopoiesis in bone. eLife, 12. https://doi.org/10.7554/eLife.82112

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Batra, Cabrera, Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) it helps us to better understand the biology of gene expression, or d) it helps us to understand epigenome editing activity. Problematically for points a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      We thank the reviewer for their comment and we agree that directly measuring gene expression (e.g., by performing RNA-seq) is easier than performing multiple PTMs in a new cell line. We designed our approach keeping in mind that the primary use case is to understand how epigenome editing would affect gene expression.

      Other approaches have been published that use histone PTM to predict expression (e.g. 27587684, 36588793). Is this model better in some way? No comparisons are made. The paper does not seem to have substantial novel insights into understanding the biology of gene expression. The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel but I doubt given the variability of the predictions (Figures 6 and S7&8) that many people will be interested in using this in a practical sense. As the authors point out, the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      We thank the reviewer for this insightful comment. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      We outline in the Discussion section that by creating a comprehensive dataset of epigenome editing outcomes, which include quantification of histone PTMs before and after in situ perturbations, will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. 

      Furthermore from the model evaluation of H3K9me3 it seems the model is not performing well for epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517). However, it seems from Figures 2&4 that the model wouldn't be able to evaluate or predict this.

      We thank the reviewer for their comment. We have included a supplementary figure, Figure 4 – figure supplement 1, that quantifies how sensitive the trained gene expression model is to perturbations in H3K9me3. Indeed our data suggests that the model predictions are sensitive to perturbations in H3K9me3. For instance, there is a clear decrease and a gradual increase as the position where the perturbation is performed moves from upstream to downstream of the TSS. Additionally, the magnitude of the predicted fold-change is a function of how much the H3K9me3 is perturbed and hence the magnitude of change would be even higher if the perturbation magnitude is increased. However, this precise magnitude is hard to estimate In the absence of experimental perturbation data for H3K9me3.

      The model seems to predict gene expression for endogenous genes quite well although the authors sometimes use expression and sometimes use rank (e.g. Figure 6) - being clearer with how the model predicts expression rather than using rank or fold change would be very useful.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA-independent off-target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

      This is an excellent point and indeed, we and others have observed that dCas9-p300 can result in off-target H3K27ac levels (both increased and suppressed) across the genome. However, p300 is one of the few known proteins that can catalyze H3K27ac in the human genome, and H3K27ac remains a proxy for active genomic regulatory elements. Nevertheless, dCas9-p300 off target activity could certainly convolute our approach. We have included language to address this caveat in our discussion. Interestingly, even though dCas9-p300 (and other epigenome editing enzymes) can lead to off-target chromatin modifications, these effects often occur without coincident disruptions to the transcriptome. This suggests that many chromatin modifications, while “supportive” or “instructive” of/for transcription, may be insufficient (either alone or in the context of dCas9-based fusions) for transcriptional effects.

      Figure 2

      It seems this figure presents known rather than novel findings from the authors' description. Please comment on whether there are any new findings in this figure. Please comment on differences in patterns of repressive and activating histone PTMs between cell lines (e.g. H1-Esc H3K27me3 green 25-50% is more enriched than red 0-25%).

      Thank you for pointing out this issue. We have revised the text in both the Results and Discussion sections to better articulate that the goal of this figure is to validate the hypothesis that there are consistent patterns of histone PTMs with respect to gene expression across different human cell types.

      In Figure 2, which illustrates the raw histone marks data, the non-monotonic behavior of H3K27me3 in H1-hESC cells is indicative of a real biological phenomenon. This interpretation is supported by the relatively low Pearson correlation for the H3K27me3 mark observed in these cells, as documented in Figure 1b of another study: https://www.biorxiv.org/content/10.1101/2024.03.29.587323v1.

      Figure 3&4

      There are a number of approaches including DeepChrome and TransferChrome that predict endogenous gene expression from histone PTMs. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. But from what is presented it isn't clear that the author's model is better or enabling beyond other approaches. The authors should show their model is better than other approaches or make clear why this is a significant advance that will be enabling for the field. For example is it that in this approach they are actually predicting expression levels whereas previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking?

      We thank the reviewer for this comment. We have added text to clarify the difference between our approach and existing approaches. There are two key differences between our model and other approaches. First, the gene expression model that we have trained here predicts gene expression values instead of gene expression levels as either high or low. Second, we have trained our models on ENCODE p-value data instead of read depths obtained from the Roadmap Epigenomics Consortium.

      Figure 5

      From the methods, it seems gene activation is measured by qpcr in hek293 transfected with individual sgRNAs and dCas9-p300. The cells aren't selected or sorted before qPCR so how are we sure that some of the variability isn't due to transfection efficiency associated with variable DNA quality or with variable transfection efficiency?

      This is a good question. All DNA preps were generated using high-quality reagents and consistent protocols. In addition, the only variable that changed with respect to transfection efficiency was the gRNA-encoding vector used in qPCR assays. We have added new data which demonstrates that transfection efficiency is shared across experiments (Figure 5 – figure supplement 1). We have also added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript (Figure 6 – figure supplement 1), which use lentiviral transduction and RNA-seq as readouts and thus, are buffered against the variances mentioned by the Reviewer.

      Figure 6

      The use of rank in 6D and 6E is confusing. In 6D a higher rank is associated with higher expression while in 6E a higher rank seems to mean a lower fold change e.g. CYP17A1 has a low predicted fold-change rank and qPCR fold-change rank but in Figure 5 a very high qPCR fold change. Labeling this more clearly or explaining it in the text further would be useful.

      We thank the reviewer for their suggestion. We have made relevant changes to the caption of Figure 6 to clarify this.

      Reviewer #2 (Public Review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 8 gene promoters, and measure gene expression changes to test their model.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. This group also utilized a tool they are experts in, dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations showed some support for the predictions after the perturbation of H3K27ac.

      Weaknesses:

      The perturbation of only 8 genes, and the only readout being qPCR-based gene expression, as opposed to including H3K27ac, weakened their validation of the computational model. Likewise, the use of six genes that were not expressed being most activated by dCas9-p300 might weaken the correlations vs. looking at a broad range of different gene expressions as the original model was trained on.

      We thank the reviewer for their comments. We have added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript. We observe that the models we have developed are able to predict the fold-change rank across genes reasonably well (Figure 6 – figure supplement 1), similar to what we observe in Figure 6E.

      Reviewer #1 (Recommendations For The Authors):

      The authors should comment on how their model is different from or better than other models that use histone PTM data to predict gene expression.

      We thank the reviewer for this insightful suggestion. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      The authors need to make clear whether their model will apply to other common epigenetic or transcriptional editors such as CRISPRi/H3K9me3 which is widely used.

      In this study, we focus on the histone changes induced by p300. However, future studies may use the framework described in our manuscript and apply it to other transcriptional editors as well.

      The authors need to be clearer about where they are predicting expression and where they are using rank. Ideally, show both.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      The authors should ideally show a case where they use the model to make a prediction of genes that can and can not be activated by dCas9-p300 or other epigenetic editors and then prove this with experiments.

      Thank you for the excellent suggestion. While it is indeed relevant, exploring this would extend beyond the scope of our current study. We consider it a valuable topic for future research.

      Reviewer #2 (Recommendations For The Authors):

      The y-axis in 5C needs to be labeled. The authors state it is "relative mRNA" but these numbers correlated with fold changes shown in Table S2.

      We have clarified the definition of the Y-axis in the caption for Figure 5C.

    1. Author response:

      Reviewer #1 (Public review):

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As standardization this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we will discuss this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how spindle amplitude was related to coupling– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as a key moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, enabling future meta-analyses to incorporate these measures comprehensively. We will add this discussion to the manuscript in the revised version to further clarify these points.

      References:

      Roebber, J. K., Lewis, P. A., Crunelli, V., Navarrete, M. & Hamandi, K. Effects of anti-seizure medication on sleep spindles and slow waves in drug-resistant epilepsy. Brain Sci. 12, 1288 (2022). https://doi.org/10.3390/brainsci12101288

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we will revise the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references. We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the traveling nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). We believe a better understanding of coupling in the context of the movement of these waves will help us better understand the observed frontal relationship with consolidation. We will address this in our revised manuscript.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and will add more information in section 3.5 to advocate for a standardized “template” used to analyze and report effect size in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, consistency, and occurrence. Each coupling metric captures distinct properties of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We will report these additional results in the revised manuscript, and interpret “the moderator effect of age becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10)

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or in a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with certain subgroups demonstrating much stronger and more meaningful effects, especially after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We will add more discussion about the influence of moderators on the dynamics of coupling-memory associations. In addition, we will update the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation”.

      Reference:

      Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168 (2019). https://doi.org/10.1177/2515245919847202.

      Bakker, A. et al. Beyond small, medium, or large: Points of consideration when interpreting effect sizes. Educ. Stud. Math. 102, 1–8 (2019). https://doi.org/10.1007/s10649-019-09908-4

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we will include a sub-section in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript.

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We will include clearer references in the next version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      References:

      Hackenberger, B.K. Bayesian meta-analysis now—let's do it. Croat. Med. J. 61, 564–568 (2020). https://doi.org/10.3325/cmj.2020.61.564

      Sutton, A.J. & Abrams, K.R. Bayesian methods in meta-analysis and evidence synthesis. Stat. Methods Med. Res. 10, 277–303 (2001). https://doi.org/10.1177/096228020101000404

      Williams, D.R., Rast, P. & Bürkner, P.C. Bayesian meta-analysis with weakly informative prior distributions. PsyArXiv (2018). https://doi.org/10.31234/osf.io/9n4zp

      van de Schoot, R., Depaoli, S., King, R. et al. Bayesian statistics and modelling. Nat Rev Methods Primers 1, 1 (2021). https://doi.org/10.1038/s43586-020-00001-2

      Smith, T.C., Spiegelhalter, D.J. & Thomas, A. Bayesian approaches to random-effects meta-analysis: a comparative study. Stat. Med. 14, 2685–2699 (1995). https://doi.org/10.1002/sim.4780142408

      Kruschke, J.K. & Liddell, T.M. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 25, 178–206 (2018). https://doi.org/10.3758/s13423-016-1221-4

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. We have provided exemplary plots in the supplemental material and will add more details to explain the interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true Z_r”. The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) Z_r correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true Z_r and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like Z_r = 0.65 is unlikely. We loop through simulations to generate population data and ensure their Z_r values fall within a threshold. For moderate effect sizes (e.g., Z_r = 0.35), this is straightforward using a narrow range (0.345 < Z_r < 0.355). However, for larger effect sizes like Z_r = 0.65, a wider range (0.6 < Z_r < 0.7) is required. therefore sometimes the population we used to draw the sample has a Z_r slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that large Z_r still has a normal sampling distribution, but not focus specifically on achieving Z_r = 0.65.

      We acknowledge that this variability of the range used was not clearly explained and it is not accurate to report “true Z_r = 0.65”. In the revised version, we will address this issue by adding vertical lines to each subplot to indicate the Z_r of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we will revise the title to “Sampling distributions of Z_r drawn from strong correlations (Z_r = 0.6-0.7)”. We confirmed that population Z_r and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming Z_r = -1 represents the null hypothesis is not accurate. The circlin Z_r = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population with the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we will update Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r = 0.5).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the Editors and reviewers for their candid evaluation of our work. While it was suggested that we should demonstrate the validity of our approach with maybe 10 different datasets but we felt that this would place an undue burden on our resources. Generally, it takes about 4 to 6 months for us to build a dataset and this does not include the time taken to train and test our AI models. This would mean that it would take us another 3 to 5 years to complete this research project if we chose to provide 10 different datasets. Publishing a research on one dataset is definitely not unheard of: for example, Subramanian et al. (2016) published their widely-cited benchmark dataset for just BACE1 inhibitors. However, we hoped that the additional work where we showed that we were able to improve the benchmark dataset for BACE1 inhibitors and achieve the same high level of predictive performance for this dataset would convince the readers (and reviewers) of the reproducibility of our approach. Furthermore, we also showed that our approach is robust and does not rely on a large volume of data to achieve this near-perfect accuracy. As can be seen in the Supplemental section, even our AI models trained on ONLY 250 BRAF actives and 250 inactives could achieve 96.3% accuracy! Logically, if the model is robust then we would expect the model to be reproducible. As such, we do not feel it is necessary for us to test our approach on 10 different datasets. 

      It was also suggested that we expand this study to other types of molecular representations to give a better idea of generalizability. We would like to point out that we tested, in total, 55 single fingerprints and paired combinations. Our goal was to create an approach that could give superior performance for virtual screening and we believe that we have achieved this. Based on the results of our study, we are of the opinion that molecular representations do not, in general, have an oversized effect on AI virtual screening. Although it is important to be aware that certain molecular representations may give SLIGHTLY better performance but we can see that with the exception of the 79-bit E-State fingerprint (which could still achieve an impressive 85% accuracy for the SVM model), nearly all molecular fingerprints and paired combinations that we used were able to achieve an accuracy of above 97%. Therefore, we do not share the reviewers' concern that our approach may not be useful when applied with other types of molecular representations.

      It is true that our work involved manual curation of the datasets but the goal of this paper is to lay down some  ground rules for the future development of a data-centric AI approach. Although manual curation is a routine practice in AI/ML, but it should be recognised that there is good manual curation and bad manual curation, and rules need to be established to ensure we have good manual curation. Without these rules, we would also not be able to establish and train a data-centric AI. All manual curation involves a level of subjectiveness but that subjectiveness comes from one's experience and domain knowledge of the field in which the AI is being applied. For example, in the case of this study, we relied on our knowledge and understanding of pharmacology to determine whether a compound is pharmacologically inactive or active. This may seem somewhat arbitrary to the uninitiated but it is anything but arbitrary. It is through careful thought and assessment of the chemical compounds that we choose these compounds for training the AI. Unfortunately, this sort of subjective assessment cannot be easily or completely explained but we do show where current practices have failed when building a dataset for training an AI for virtual screening.

    1. Author response:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      We agree with this objection, and in the corrected version we plan to provide the assessment of the egg laying rhythms for the missing GAL4 controls as recommended only for Figure 3.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have recently acquired mutant flies with a dominant negative-cycle transgene (UAS-cycDN, Tanoue et al. 2004), and we plan to repeat our experiments with these mutants, in order to confirm our results.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      In the revised version we will show that the detrending approach used does not introduce any artefacts. The analysis of numerical simulations with an aperiodic stochastic signal superposed to a decaying signal shows that the detrending method used does not result in a spurious periodic signal. Furthermore, we can show that when the underlying signal is rhythmic, the correct period is obtained even when the moving average is a few hours larger or smaller than 24 h.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      We apologize for not being clear enough. The device can in principle sample at any desired resolution. Notice, however, that the variable we are analyzing (number of eggs laid by a single female) has only a few possible values, which is one of the features that render the assessment of rhythmicity a particularly difficult task. If egg laying is sampled more often (say, at 2 h intervals) more time points will be available, but the values available for each time point will be much less. We will show an example where we compare both rates (2h and 4h). Even though the 2h sampling reveals the rhythmicity of the time series, the significance of the peaks obtained is less than when sampling at 4h intervals. We have found that a 4h sampling seems to provide the best compromise between frequency of the sampling and discreteness of the variable.

      On the other hand, it is important to stress that sampling frequency and longer durations are not very correlated (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). It has been shown that the best way to make accurate predictions of the period of a rhythmic signal is to have a series spanning many cycles, irrespective of the sampling frequency. In other words, it is not true that with a 2h sampling it would be possible to analyze shorter series than with 4h sampling. Unfortunately, egg laying records are usually less than 5 cycles long, which is one of the reasons for the difficulties in the assessment of their rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      The assumption is difficult to test rigorously, since for individual flies the records seem to be so noisy that no information can be extracted. As shown in the paper, it is even very difficult to assess the presence of rhythmicity at the individual level. We consider that the appearance of a rhythm after averaging several records shows the presence of this rhythm at the individual level. But it could be argued that the presence of rhythmicity in the average record could be due to only a few (or even a single) rhythmic individuals. In order to show that this is probably not the case, in the revised version we will show that, when the individuals that are rhythmic are left out, the average of the remaining flies still shows a rhythm (albeit a weaker one, as was to be expected).

      Regarding our assumption that all flies have the “same” period, the results on Fig. 1 F cannot really rule out this possibility, because with so few cycles, the determination of the period is not very accurate (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). In our case, the error for the period is related to the width of the corresponding peak in the periodogram, which is typically 4 hs. In any case, in the revised version we will try to show, by using numerical simulations, that when the individual periods are not the same, but are distributed approximately as in Fig 1F, the average series is still rhythmic with the correct period.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      Even though we think that the individual records are in general too noisy to be really informative, we will provide all the individual egg profiles in the Supplementary Material of the revised version, in order to let the reader, check this for herself/himself.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that this may introduce some bias in the results. But in our opinion this bias is very difficult to avoid, since for females that lay very few eggs, rhythmicity can even be difficult to define (some females can spend a whole day without laying a single egg). On the other hand, even when the results may not be representative of the whole population, they would be representative of the flies that lay most of the eggs in a population, which seems to be very relevant in ecological terms.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      The question of possible rhythmic outliers has been addressed above, in question 5, where we discuss why we think that such outliers are not “determinant for the observed level of rhythmicity”. As also mentioned above, even though we think that they are too noisy to be informative, we plan to include all individual profiles in the Supplementary Material.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We assume that the features mentioned refer to the appearance in the periodograms of two small peaks under the significance lines. We are aware that in the studies of the rhythmicity of locomotor activity such features are usually interpreted as “complex rhythms”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, however, at least two other possibilities should be taken into account. Since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two small peaks could correspond to the periods of two different subpopulations. Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles (as explained above) and also few points per cycle. A cursory examination of the individual profiles, that will be provided in the new version, do not seem to support any of the first two possibilities mentioned. On the other hand, we will show evidence that the analysis of series that are perfectly random sometimes result in periodograms with some small peaks.

    1. Author response:

      We would like to thank the editors and reviewers for taking the time to help improve our manuscript. We appreciate the feedback and will definitely increase the level of methodological detail in a revised submission.

      Here is a brief summary of our plan to address the points raised by the reviewers. We will respond to the comments in a point-by-point manner when we resubmit a revised manuscript.

      Reviewer 1

      This reviewer raised a question about the 60 Hz frame rate for recording. We agree that increasing the number of cameras and frame rate would improve the tracking quality, but this would come at the cost of scalability. In the current study (and other concurrent studies in the lab), we recorded from 10-20 families simultaneously to try to sample the distribution of behavioral responses to stimuli observed in animals in our colony. This was only possible logistically because of the lightweight equipment design allowing us to record data from animals without large disruptions to their home-cage environment.

      One strategy for acquiring higher-resolution data is to build a small number of enclosures that are fully surrounded by cameras, and to cycle animals through these enclosures (1). However, this strategy limits throughput by reducing the number of animals per day that can be studied. If the size and cost of cameras and computers decreases in the future, then this recording strategy will be scalable to the whole-colony level. For our current study and analysis, we are limited by the resolution of our dataset. We do believe that our data (although not a perfect 3d reconstruction or an extremely high frame rate) is sufficient to label behavioral states with high accuracy. We will add a figure to more clearly show that behavioral state data can be accurately inferred from this imperfect data, which has also been recently highlighted by other groups (2).

      Additionally, with recent progress in the application of deep learning to animal pose tracking, new models can infer 3d pose dynamics from 2d data (3) and leverage spatiotemporal structure to clean up noisy data (4). We believe that other groups will be able to use these types of approaches to extract much more value from this dataset. So, in summary, we do understand the concern related to reconstruction quality and will 1) more clearly define the usefulness of our current models, 2) release our data and code so that others can build upon it or repurpose it, and 3) plan future experiments with higher camera count and frame rate as permitted by logistical constraints. 

      Reviewer 2

      This reviewer asked for an increased level of methodological detail. We will try to address this in a few ways:

      (1) Code and data sharing. We believe that many of the questions related to the methodology will be best answered by sharing the data and code directly. Because there is a large amount of code associated with this manuscript, it is impractical to list every step and every parameter in the paper. Along with our revised manuscript, we will make our data and code publicly available. That said, we will improve our description of key parameters in the paper as the reviewer suggested.

      (2) More detailed Methods section. The reviewer asked us to provide more methodological detail. We understand that this is currently a weakness of our manuscript, and we will focus on addressing it. For instance, the reviewer rightly points out that we did not describe the motion watches used to generate the data in Figure S7. We will address this.

      (3) Simplify the manuscript. The paper currently has 22 figures, and further analysis could be done based on the results shown in any of them. For instance, this reviewer asked us to add a comparison across females and males (similar to our comparison of juveniles and adults). While we plan to add that analysis, we recognize that there are several figures/panels that are not closely related to our intended goal of describing the patterns we found in our large dataset. We will simplify the manuscript by removing some excess figures/panels and focus on describing the parts of the analysis that are crucial to our conclusions in greater detail.

      (4) More careful language. This reviewer pointed out that there were some inaccuracies with our descriptive language. For instance, we used the term "natural" behavior to describe the behavior of animals in captivity, which may more accurately be described as their home-cage behavior. We will be more careful to align our language to the standard for the field. For instance, several studies refer to unrestrained behavior in a laboratory setting as "spontaneous" behavior rather than "natural" behavior (5). In our case, the data consists of both spontaneously occurring behavior and responses to a set of stimuli. We will make sure that the descriptions are more precise in the revised manuscript.

      (1) Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat Commun 11, (2020).

      (2) Weinreb, C. et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. bioRxiv (2023) doi:10.1101/2023.03.16.532307.

      (3) Gosztolai, A. et al. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 18, 975–981 (2021).

      (4) Wu, A. et al. Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking. Adv Neural Inf Process Syst 33, 6040–6052 (2020).

      (5) Levy, D. R. et al. Mouse spontaneous behavior reflects individual variation rather than estrous state. Curr Biol 33, 1358-1364.e4 (2023).

    1. Author response:

      Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry, the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also, the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      We are very pleased that Reviewer 1 thought our data is solid.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      Thank you Reviewer 1 for providing us insightful suggestions. Based on our fiber photometry data that the activities of PSTN Tac1+ neurons show a significant increase in CS-evoked calcium fluorescent signals in late trials relative to those in early trials (Figure 1H-K) and our optogenetic inhibition experiments during CS (Figure 2N-Q), these results illustrate that the activities of PSTN Tac1+ neurons are modulated by learning and are required for active avoidance learning. Moreover, PSTN Tac1+ neurons are activated by footshock and activation of these cells is sufficient to induce avoidance behavior. These findings demonstrate that PSTN Tac1+ neurons encode aversive information. Together, our current data support that PSTN Tac1+ neurons encode both aversive event and its predicting cue. We will clarify our conclusions in the revised manuscript.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall, I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      In the revised version of manuscript, we will provide more histological and functional evidence for the PSTN-to-CeA and PSTN-to-PBN circuits to support our conclusion on the functional roles of these downstream targets. Similar with our anterograde experiment that the PSTN densely projects to CeA and PBN (Figure S6), optogenetic activation and inhibition experiments showed dense axonal terminals in the CeA and PBN from the PSTN and this line of data will be included in the revised manuscript. In addition, we will further examine these circuits by investigating the functional roles of CeA-projecting or PBN-Projecting PSTN neurons during 2-way active avoidance task.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      In the revised version of manuscript, we will provide larger example images containing pSTN and its adjacent areas to demonstrate that the viral expression is well restricted into this brain area. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      I totally agree with Reviewer 1’s concerns. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      Thank you Reviewer 1 for this suggestion. We will re-plot the data as individual measurements in the revised manuscript.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      Thank you Reviewer 1 for this insightful suggestion. During the review process, we have performed this line of experiment as in Figure S3. We measured the behavioral responses during pSTN optogenetic inhibition after the mice already learned to associate CS with US and found most GtACR-expressing mice showed unaffected avoidance learning. This data will be included in the revised manuscript.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      I agree with the Reviewer 1’s comment on the string findings in the optogenetic inhibition results. Indeed, based on the results on days 1 and 2, optogenetic inhibition of PSTN tac1+ neurons has significantly blocked GtACR-expressing animals’ behavioral performance during 2-way active avoidance task. To examine whether the effect by optogenetic inhibition of these neurons could possibly decline with prolonged training, we conducted additional 5-day training. We will discuss and add this comment in the revised manuscript.

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      We will add histological images and clarify these comments in the revised manuscript. The purpose of this experiment is to illustrate that even slightly spreading ChR2 viruses into Tac1+ neurons of the adjacent areas of the PSTN did not result in behavioral changes and this will indirectly support the main behavioral function caused by the PSTN tac1+ neurons rather than its neighboring areas. Because Tac1+ neurons outside the PSTN are sparsely expressed, it is quite difficult to completely restrict the viral expression in the PSTN from the anterior to the posterior. Thus, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

      We will follow Reviewer 1’s suggestion to include raw data (in seconds of time spent) in each compartment for each group across baseline and test days in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      We highly appreciate that Reviewer 2 thought that our experiments presented were well-designed to support the conclusions and provided valuable information in several aspects.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      Similar with questions 3 and 8 of Reviewer 1. We will provide the viral expression and fiber implant location data for all animals included in the figures and histological images in Figure S5 in the revised manuscript. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.  

      2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      We will follow Reviewer 2’s suggestion and perform isosbestic-correction for fluorescent signals prior to calculating ΔF/F. We will re-plot related figures and add this information in the revised manuscript.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      We will follow Reviewer 2’s suggestion to perform a control experiment showing intact locomotor ability in caspase 3-ablated mice and will include this data in the revised manuscript.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      Thank you Reviewer 2 for this useful suggestion. We will examine the valence with PTSN silencing manipulations by using a RTPP test and add this data in the revised manuscript.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

      Thank you Reviewer 2 for this useful suggestion. During the review process, we have performed ablation and inhibition experiments in females, demonstrating similar behavioral effects as those in males. We will add these data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      We are very pleased to have Reviewer 3’s positive comments on the manuscript.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      Thank you Reviewer 3 for this useful suggestion. We will perform in vitro slice recording experiments to verify optogenetic manipulations and add this line of evidence in the revised manuscript.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      Similar with question 4 of Reviewer 1. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp-ablated mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for should be clearly described in each plot.

      We have provided all statistical information in the Supplementary Table 1. In the revised manuscript, we will perform across-animal tests, re-plot new figures and provide clear statistical information.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      Following Reviewer 3’s suggestion, we will perform across-animal tests. In the first version of our manuscript, for fiber photometry experiments, we pooled trial data of each animal and performed statistics tests across trials. Because avoidance and failure trials were different, we thus selected an unpaired test for this kind of dataset.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      Similar with question 4 of Reviewer 3, we pooled trial data of each animal and performed statistics tests across trials. We will perform across-animal tests and re-plot figures by connecting with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials for each animal.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      Thank you Reviewer 3 for this useful suggestion. We will follow this suggestion and add this analysis in the revised manuscript.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      Similar with question 9 of Reviewer 1, we will show the original number before normalization in the revised manuscript.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      We will follow this suggestion and provide histological images with lower magnification in the revised manuscript.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      We will quantify the effects of different downstream targets of the PSTN to make a precise conclusion.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

      As mentioned above, we will perform across-animal tests and provide clear statistical information in the figure legends and supplementary table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The investigators in this study analyzed the dataset assembly from 540 Salmonella isolates, and those from 45 recent isolates from Zhejiang University of China. The analysis and comparison of the resistome and mobilome of these isolates identified a significantly higher rate of cross-region dissemination compared to localized propagation. This study highlights the key role of the resistome in driving the transition and evolutionary 

      Thank you for summarizing our work. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Additionally, to fully contextualize the background knowledge and clarify the major points in this study, we add some references.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion and keep the uniform knowledge in the typing system, we have adjusted the lineage nomenclature along the revised manuscript to reflect the corrected order as follows:

      Author response table 1.

      To ensure consistency with previous studies, we have revised the nomenclature for the different lineages of bvSP.

      Strengths: 

      The isolates included in this study were from 16 countries in the past century (1920 to 2023). While the study uses S. Gallinarun as the prototype, the conclusion from this work will likely apply to other Salmonella serotypes and other pathogens. 

      Thanks for the constructive comments and the positive reception of the manuscript.

      Weaknesses: 

      While the isolates came from 16 countries, most strains in this study were originally from China. 

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries, with a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      Author response image 1.

      Geographic distribution of 580 S. Gallinarum. Different colors indicate the countries of origin for the 580 S. Gallinarum strains in the dataset. Darker shades represent higher numbers of strains.

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation of the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains challenging.  

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. There are more frequent reports of fowl typhoid in some high chicken-producing developing countries. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).

      Author response image 2.

      The United States Department of Agriculture (USDA) data on annual chicken meat production for 2023/2024 across different countries globally.

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms.

      (4) As China is the primary country of origin for the strains in this study, it is necessary to ensure that the strains from China are consistent with the local geographic characteristics of the country. Therefore, we conducted a correlation analysis between the number of strains from different provinces in China and the total GDP/population size of those provinces (Author response image 3). The results show that most points fall within the 95% confidence interval of the regression line. Although some points exhibit relative unbalance in the number of S. Gallinarum strains, most data points for these regions have a small sample size (n < 15). Overall, we found that the prevalence of S. Gallinarum in different regions of China is consistent with the overall nationwide trend.

      Author response image 3.

      Correlation analysis between the number of S. Gallinarum collected from different provinces in China and the total GDP/population size. The figure depicts a series of points representing individual provinces. The x-axis indicates the number of S. Gallinarum included in the dataset, while the y-axis displays the values for total GDP and total population size, respectively.

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we have further emphasized the limitations as follow:

      Lines 427-429: “However, the current study has some limitations. Firstly, despite assembling the most comprehensive WGS database for S. Gallinarum from public and laboratory sources, there are still biases in the examined collection. The majority (438/580) of S. Gallinarum samples were collected from China, possibly since the WGS is a technology that only became widely available in the 21st century. This makes it impractical to sequence it on a large scale in the 20th century, when S. Gallinarum caused a global pandemic. So, we suspect that human intervention in the development of this epidemic is the main driving force behind the fact that most of the strains in the data set originated in China. In our future work, we aim to actively gather more data to minimize potential biases within our dataset, thereby improving the robustness and generalizability of our findings.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors sequence 45 new samples of S. Gallinarum, a commensal Salmonella found in chickens, which can sometimes cause disease. They combine these sequences with around 500 from public databases, determine the population structure of the pathogen, and coarse relationships of lineages with geography. The authors further investigate known anti-microbial genes found in these genomes, how they associate with each other, whether they have been horizontally transferred, and date the emergence of clades. 

      Thank you for your constructive suggestions, which are valuable and highly beneficial for improving our paper. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Furthermore, to fully contextualize the background knowledge and clarify the major points in this study, we add some references to support our findings and policy implications.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      Strengths: 

      (1) It doesn't seem that much is known about this serovar, so publicly available new sequences from a high-burden region are a valuable addition to the literature. 

      (2) Combining these sequences with publicly available sequences is a good way to better contextualise any findings. 

      Thank you so much for your thorough review and constructive comments on the manuscript.

      Weaknesses: 

      There are many issues with the genomic analysis that undermine the conclusions, the major ones I identified being: 

      (1) Recombination removal using gubbins was not presented fully anywhere. In this diversity of species, it is usually impossible to remove recombination in this way. A phylogeny with genetic scale and the gubbins results is needed. Critically, results on timing the emergence (fig2) depend on this, and cannot be trusted given the data presented. 

      We sincerely thank you for pointing out this issue. In the original manuscript, we aimed to present different lineages of S. Gallinarum within a single phylogenetic tree constructed using BEAST. However, in the revised manuscript, we have addressed this issue by applying the approach recommended by Gubbins to remove recombination events for each lineage defined by FastBAPs. Additionally, to better illustrate the removal of recombination regions in the genome, we have included a figure generated by Gubbins (New Supplementary Figure 12). 

      Our results indicate that recombination events are relatively infrequent in Lineage 1, followed by Lineage 3, but occur more frequently in Lineage 2. In the revised manuscript, we have included additional descriptions in the Methods section to clarify this analysis. We hope these modifications adequately address the reviewer’s concerns and enhance the trustworthiness of our findings.

      (2) The use of BEAST was also only briefly presented, but is the basis of a major conclusion of the paper. Plot S3 (root-to-tip regression) is unconvincing as a basis of this data fitting a molecular clock model. We would need more information on this analysis, including convergence and credible intervals. 

      Thank you very much for raising this issue. We decided to reconduct separate BEAST analyses for each lineage, accurately presenting the evolutionary scale based on the abovementioned improvements. The implementation of individual lineage for BEAST analysis was conducted based on the following steps:

      (1) Using R51 as the reference, a reference-mapped multiple core-genome SNP sequence alignment was created, and recombination regions were detected and removed as described above.

      (2) TreeTime was used to assess the temporal structure by performing a regression analysis of the root-to-tip branch distances within the maximum likelihood tree, considering the sampling date as a variable (New Supplementary Figures 6). However, the root-to-tip regression analysis presented in New Supplementary Figures 6 was not intended as a basis for selecting the best molecular clock model; its purpose was to clean the dataset with appropriate measurements.

      (3) To determine the optimal model for running BEAST, we tested a total of six combinations in the initial phase of our study. These combinations included the strict clock, relaxed lognormal clock, and three population models (Bayesian SkyGrid, Bayesian Skyline, and Constant Size). Before conducting the complete BEAST analysis, we evaluated each combination using a Markov Chain Monte Carlo (MCMC) analysis with a total chain length of 100 million and sampling every 10,000 iterations. We then summarized the results using NSLogAnalyser and determined the optimal model based on the marginal likelihood value for each combination. The results indicated that the model incorporating the Bayesian Skyline and the relaxed lognormal clock yielded the highest marginal likelihood value in our sample. Then, we proceeded to perform a timecalibrated Bayesian phylogenetic inference analysis for each lineage. The following settings were configured: the "GTR" substitution model, “4 gamma categories”, the "Relaxed Clock Log Normal" model, the "Coalescent Bayesian Skyline" tree prior, and an MCMC chain length of 100 million, with sampling every 10,000 iterations.

      (4) Convergence was assessed using Tracer, with all parameter effective sampling sizes (ESS) exceeding 200. Maximum clade credibility trees were generated using TreeAnnotator. Finally, key divergence time points (with 95% credible intervals) were estimated, and the tree was visualized using FigTree. 

      For the key lineages, L2b and L3b (carrying the resistome, posing antimicrobial resistance (AMR) risks, and exhibiting intercontinental transmission events), we have redrawn Figure 2 based on the updated BEAST analysis results (New Figure 2). For L1, L2a, and L3c, we have added supplementary figures to provide a more detailed visualization of their respective BEAST analysis outcomes (New Supplementary Figures 3-5). The revised BEAST analysis indicates that the origin of L3b in China can be traced back to as early as 1683 (95% CI: 1608 to 1839). In contrast, the earliest possible origin of L2b in China dates back to 1880 (95% CI: 1838 to 1902). This indicates that the previous manuscript's assumption that L2b is an older lineage compared to L3b may be inaccurate. 

      Furthermore, In the revised manuscript, we specifically estimated the time points for the first intercontinental transmission events for the two major lineages, L2b and L3b. Our results indicate that L2b, likely underwent two major intercontinental transmission events. The first occurred around 1893 (95% CI: 1870 to 1918), with transmission from China to South America. The second major transmission event occurred in 1923 (95% CI: 1907 to 1940), involving the spread from South America to Europe. In contrast, the transmission pattern of L3b appears relatively more straightforward. Our findings show that L3b, an S. Gallinarum lineage originating in China, only underwent one intercontinental transmission event from China to Europe, likely occurring around 1790 (95% CI: 1661 to 1890) (New Supplementary Figure 7). Based on the more critical BEAST analysis for each lineage, we have revised the corresponding conclusions in the manuscript. We believe that the updated BEAST analysis, performed using a more accurate recombination removal approach, significantly enhances the rigor and credibility of our findings.

      (3) Using a distance of 100 SNPs for a transmission is completely arbitrary. This would at least need to be justified in terms of the evolutionary rate and serial interval. 

      Using single nucleotide polymorphism (SNP) distance to trace pathogen transmission is a common approach (J Infect Dis. 2015 Apr 1;211(7):1154-63) and in our previous studies (hLife 2024; 2(5):246-256. mLife 2024; 3(1):156-160.). When the SNP distance within a cluster falls below a set threshold, the strains in that cluster are considered to have a potential direct transmission link. It is generally accepted that the lower the threshold, the more stringent the screening process becomes. However, there is little agreement in the literature regarding what such a threshold should be, and the appropriate SNP cut-off for inferring transmission likely depends critically on the context (Mol Biol Evol. 2019 Mar 1;36(3):587-603).

      In this study, we compared various thresholds (SNPs = 5, 10, 20, 25, 30, 35, 40, 50, 100) to ensure clustering in an appropriate manner. First, we summarized the tracing results under each threshold (Author response image 4), which demonstrated that, regardless of the threshold used, all strains associated with transmission events originated from the same location (New Figure 3a).

      Author response image 4.

      Clustering results of 45 newly isolated S. Gallinarum strains using different SNP thresholds of 5, 10, 15, 20, 25, 28, 30, 50, and 100 SNPs. The nine subplots represent the clustering results under each threshold. Each point corresponds to an individual strain, and lines connect strains with potential transmission relationships.

      In response to your comments regarding the evolutionary rate, we estimated the overall evolutionary rate of the S. Gallinarum using BEAST. We applied the methodology described by Arthur W. Pightling et al. (Front Microbiol. 2022 Jun 16; 13:797997). The numbers of SNPs per year were determined by multiplying the evolutionary rates estimated with BEAST by the number of core SNP sites identified in the alignments. We hypothesize that a slower evolutionary rate in bacteria typically requires a lower SNP threshold when tracing transmission events using SNP distance analysis. Pightling et al.'s previous research found an average evolutionary rate of 1.97 SNPs per year (95% HPD, 0.48 to 4.61) across 22 different Salmonella serotypes. Our updated BEAST estimation for the evolutionary rate of S. Gallinarum suggests it is approximately 0.74 SNPs per year (95% HPD, 0.42 to 1.06). Based on these findings, and our previous experience with similar studies (mBio. 2023 Oct 31;14(5):e0133323.), we set a threshold of 5 SNPs in the revised manuscript.

      Then, we adopted the newly established SNP distance threshold (n=5) to update Figure 3a and New Supplementary Figure 8. The heatmap on the far right of New Figure 3a illustrates the SNP distances among 45 newly isolated S. Gallinarum strains from two locations in Zhejiang Province (Taishun and Yueqing). New Supplementary Figure 8 simulates potential transmission events between the bvSP strains isolated from Zhejiang Province (n=95) and those from China with available provincial information (n=435). These analyses collectively demonstrate the localized transmission pattern of bvSP within China. Our analysis using the newly established SNP threshold indicates that the 45 strains isolated from Taishun and Yueqing exhibit a highly localized transmission pattern, with pairs of strains exhibiting potential transmission events below the set threshold occurring exclusively within a single location. Subsequently, we conducted the SNP distance-based tracing analysis for the 95 strains from Zhejiang Province and those from China with available provincial information (n=435) (New Supplementary Figure 8, New Supplementary Table S8). Under the SNP distance threshold (n=5), we identified a total of 91 potential transmission events, all of which occurred exclusively within Zhejiang Province. No inter-provincial transmission events were detected. Based on these findings, we revised the methods and conclusions in the manuscript accordingly. We believe that the updated version well addresses your concerns.

      Nevertheless, the final revised and updated results do not change the conclusions presented in our original manuscript. Instead, applying a more stringent SNP distance threshold allows us to provide solid evidence supporting the localized transmission pattern of S. Gallinarum in China. 

      (4) The HGT definition is non-standard, and phylogeny (vertical inheritance) is not controlled for.  

      The cited method: 

      'In this study, potentially recently transferred ARGs were defined as those with perfect identity (more than 99% nucleotide identity and 100% coverage) in distinct plasmids in distinct host bacteria using BLASTn (E-value {less than or equal to}10−5)' 

      This clearly does not apply here, as the application of distinct hosts and plasmids cannot be used. Subsequent analysis using this method is likely invalid, and some of it (e.g. Figure 6c) is statistically very poor. 

      Thank you for raising this important question. In our study, Horizontal Gene Transfer (HGT) is defined as the transfer of genetic information between different organisms, a process that facilitates the spread of antibiotic resistance genes (ARGs) among bacteria. This definition of HGT is consistent with that used in previous studies (Evol Med Public Health. 2015; 2015(1):193–194; ISME J. 2024 Jan 8;18(1):wrad032). In Salmonella, the transfer of antimicrobial resistance genes via HGT is not solely dependent on plasmids; other mobile genetic elements (MGEs), such as transposons, integrons, and prophages, also play significant roles. This has also  been documented in our previous work (mSystems. 2023 Dec 21;8(6):e0088323). Given the involvement of various MGEs in the horizontal transfer of ARGs, we propose that the criteria for evaluating horizontal transfer via plasmids can also be applied to ARGs mediated by other MGEs.

      In this study, we adopted stricter criteria than those used by Xiaolong Wang et al. Specifically, we defined two ARGs as identical only if they exhibited 100% nucleotide identity and 100% coverage. To address concerns regarding the potential influence of vertical inheritance in our analysis, we have made the following improvements. In the revised manuscript, we provide a more detailed table that includes the co-localization analysis of each ARG with mobile genetic elements (New Supplementary Table 9). For prophages and plasmids, we required that ARGs be located directly within these elements. In contrast, for transposons and integrons, we considered ARGs to be associated if they were located within a 5 kb region upstream or downstream of these elements (Nucleic Acids Res. 2022 Jul 5;50(W1):W768-W773). 

      In the revised manuscript, we first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China according to the aforementioned criteria and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, we recalculated the overall HGT frequency of 10 types of ARGs in China, the horizontal ARGs transfer frequency in three key regions, and the horizontal ARGs transfer frequency within a single region (New Supplementary Table 7). Based on the results, we updated relevant sections of the manuscript and remade Figure 6. The updated manuscript describes the results of this section as follows:

      “Horizontal transfer of resistome occurs widely in localized bvSP

      Horizontal transfer of the resistome facilitates the acquisition of AMR among bacteria, which may record the distinct acquisition event in the bacterial genome. To compare these events in a geographic manner, we further investigated the HGT frequency of each ARG carried by bvSP isolated from China and explored the HGT frequency of resistome between three defined regions. Potentially horizontally transferred ARGs were defined as those with perfect identity (100% identity and 100% coverage) and were located on MGEs across different strains (Fig. 6a). We first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, our findings reveal that horizontal gene transfer of ARGs is widespread among Chinese bvSP isolates, with an overall transfer rate of 92%. Specifically, 50% of the ARGs exhibited an HGT frequency of 100%, indicating that these ARGs might underwent extensive frequent horizontal transfer events (Fig. 6b). It is noteworthy that certain resistance genes, such as tet(A), aph(3'')-Ib, and aph(6)-Id, appear to be less susceptible to horizontal transfer.

      However, different regions generally exhibited a considerable difference in resistome HGT frequency. Overall, bvSP from the southern areas in China showed the highest HGT frequency (HGT frequency=95%). The HGT frequencies for bvSP within the eastern and northern regions of China are lower, at 92% and 91%, respectively (Fig. 6c). For specifical ARG type, we found tet(A) is more prone to horizontal transfer in the southern region, and this proportion was considerably lower in the eastern region. Interestingly, certain ARGs such as aph(6)-Id, undergo horizontal transfer only within the eastern and northern regions of China (Fig. 6d). Notably, as a localized transmission pathogen, resistome carried by bvSP exhibited a dynamic potential among inter-regional and local demographic transmission, especially from northern region to southern region (HGT frequency=93%) (Fig. 6e, Supplementary Table 7).”

      We also modified the current version of the pipeline used to calculat the HGT frequency of resistance genes. In the revised pipeline, users are required to provide a file specifying the locations of mobilome on the genome before formally calculating the HGT frequency of the target ARGs. The specific code and data used in the calculation have been uploaded to https://github.com/tjiaa/Cal_HGT_Frequency.

      However, we also acknowledge that the current in silico method has some limitations. This approach heavily relies heavily on prior information in existing resistome/mobilome databases. Additionally, the characteristics of second-generation sequencing data make it challenging to locate gene positions precisely. Using complete genome assemblies might be a crucial approach to address this issue effectively. In the revised manuscript, we have also provided a more detailed explanation of the implications of the current pipeline.

      Regarding your second concern, "some of it (e.g., Figure 6c) is statistically very poor," the horizontal ARG transfer frequency calculation for each region was based on the proportion of horizontal transfer events of ARGs in that region to the total possible transfer events. As a result, we are unable to calculate the statistical significance between the two regions. Our aim with this approach is to provide a rough estimate of the extent of horizontal ARG transfer within the S. Gallinarum population in each region. In future studies, we will refine our conclusions by developing a broader range of evaluation methods to ensure more comprehensive assessment and validation.

      (5) Associations between lineages, resistome, mobilome, etc do not control for the effect of genetic background/phylogeny. So e.g. the claim 'the resistome also demonstrated a lineage-preferential distribution' is not well-supported. 

      Thank you for your comments. We acknowledge that the associations between lineages and the mobilome/resistome may be influenced by the genetic background or phylogeny of the strains. For instance, our conclusion regarding the lineage-preferential distribution of the resistome was primarily based on New Figure 4a, where L3 is clearly shown to carry the most ARGs. Furthermore, we observed that L3b tends to harbor bla<sub>_TEM-1B</sub>, _sul2, and tet(A) more frequently than other lineages. However, we recognize that this evidence is insufficient to support a definitive conclusion of “demonstrated a lineage-preferential distribution”. Therefore, we have re-examined the current manuscript and described these findings as a potential association between the mobilome/resistome and lineages.

      (6) The invasiveness index is not well described, and the difference in means is not biologically convincing as although it appears significant, it is very small. 

      Thank you for pointing this out. For the invasiveness index mentioned in the manuscript, we used the method described in previous studies. (PLoS Genet. 2018 May 8;14(5), Nat Microbiol. 2021 Mar;6(3):327-338). Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed samples using the 196 top predictor genes, employing a machine-learning approach that utilizes a random forest classifier and delta-bitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at https://github.com/Gardner-BinfLab/invasive_salmonella. In the revised manuscript, we added a more detailed description of the invasiveness index calculation in the Methods section as follows:

      Lines 592-603: “Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed each sample using the 196 top predictor genes for measuring the invasiveness of S. Gallinarum, employing a machine-learning approach that utilizes a random forest classifier and deltabitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at: https://github.com/Gardner-BinfLab/invasive_salmonella.”

      Regarding the second question, 'the difference in means is not biologically convincing as although it appears significant, it is very small,' we believe that this difference is biologically meaningful. In our previous work, we infected chicken embryos with different lineages of S. Gallinarum (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). The virulence of thirteen strains of Salmonella Gallinarum, comprising five from lineage L2b and eight from lineage L3b, was evaluated in 16-day-old SPF chicken embryos through inoculation into the allantoic cavity. Controls included embryos that inoculated with phosphate-buffered saline (PBS). The embryos were incubated in a thermostatic incubator maintained at 37.5°C with a relative humidity ranging from 50% to 60%. Prior to inoculation, the viability of the embryos was assessed by examining the integrity of their venous system and their movements; any dead embryos were excluded from the study. Overnight cultures resuspended in PBS at a concentration of 1000 CFU per 100 μL were administered to the embryos. Mortality was recorded daily for a period of five days, concluding upon the hatching of the chicks. 

      It is generally accepted that strains with higher invasive capabilities are more likely to cause chicken embryo mortality. Our experimental results showed that the L2b, which exhibits higher invasiveness, with a slightly higher to cause chicken embryo death (Author response image 5). 

      Author response image 5.

      The survival curves of chicken embryos infected with bvSP isolates from S. Gallinarum L2b and S. Gallinarum L3b. Inoculation with Phosphate Buffer Saline (PBS) were considered controls. 

      (7) 'In more detail, both the resistome and mobilome exhibited a steady decline until the 1980s, followed by a consistent increase from the 1980s to the 2010s. However, after the 2010s, a subsequent decrease was identified.' 

      Where is the data/plot to support this? Is it a significant change? Is this due to sampling or phylogenetics? 

      Thank you for highlighting these critical points. The description in this statement is based on New Supplementary Figure 11. On the right side of New Supplementary Figure 11, we presented the average number of Antimicrobial Resistance Genes (ARGs) and Mobile Genetic Elements (MGEs) carried by S. Gallinarum isolates from different years, and we described the overall trend across these years. However, we realized that this statement might overinterpret the data. Given that this sentence does not impact our emphasis on the overall increasing trends observed in the resistome and mobilome, as well as their potential association, we decided to remove it in the revised manuscript.

      The revised paragraph would read as follows:

      Lines 261-268: “Variations in regional antimicrobial use may result in uneven pressure for selecting AMR. The mobilome is considered the primary reservoir for spreading resistome, and a consistent trend between the resistome and the mobilome has been observed across different lineages, from L1-L3c. We observed an overall gradual rise in the resistome quantity carried by bvSP across various lineages, correlating with the total mobilome content (S11 Fig). Furthermore, we investigated the interplay between particular mobile elements and resistome types in bvSP.”

      (8) It is not clear what the burden of disease this pathogen causes in the population, or how significant it is to agricultural policy. The article claims to 'provide valuable insights for targeted policy interventions.', but no such interventions are described. 

      Thank you for your constructive suggestions. Salmonella Gallinarum is an avian-specific pathogen that induces fowl typhoid, a severe systemic disease characterized by high mortality rates in chickens, thereby posing a significant threat to the poultry industry, particularly in developing countries (Rev Sci Tech. 2000 Aug;19(2):40524). In our previous research, we conducted a comprehensive meta-analysis of 201 publications encompassing over 900 million samples to investigate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). Our findings estimated that the global prevalence of S. Gallinarum is 8.54% (with a 95% confidence interval of 8.43% to 8.65%), with notable regional variations in incidence rates.

      Our previously analysis focused on the prevalence of S. Gallinarum (including biovars SP and SG) across six continents. The results revealed that all continents, except Oceania, exhibited positive prevalences of S. Gallinarum. Asia had the highest prevalence at 17.31%, closely followed by Europe at 16.03%. In Asia, the prevalence of biovar SP was higher than that of biovar SG, whereas in Europe, biovar SG was observed to be approximately two hundred times more prevalent than biovar SP. In South America, the prevalence of S. Gallinarum was higher than that of biovar SP, at 10.06% and 13.20% respectively. Conversely, the prevalence of S. Gallinarum was relatively lower in North America (4.45%) compared to Africa (1.10%) (Author response image 6).

      Given the significant economic losses caused by S. Gallinarum to the poultry industry and the potential risk of escalating antimicrobial resistance, more targeted policy interventions are urgently needed. Further elaboration on this implication is provided in the revised “Discussion” section as follows:

      Lines 401-416: “In summary, the findings of this study highlight that S. Gallinarum remains a significant concern in developing countries, particularly in China. Compared to other regions, S. Gallinarum in China poses a notably higher risk of AMR, necessitating the development of additional therapies, i.e. vaccine, probiotics, bacteriophage therapy in response to the government's policy aimed at reducing antimicrobial use ( J Infect Dev Ctries. 2014 Feb 13;8(2):129-36). Furthermore, given the dynamic nature of S. Gallinarum risks across different regions, it is crucial to prioritize continuous monitoring in key areas, particularly in China's southern regions where the extensive poultry farming is located. Lastly, from a One-Health perspective, controlling AMR in S. Gallinarum should not solely focus on local farming environments, with improved overall welfare on poultry and farming style. The breeding pyramid of industrialized poultry production should be targeted on the top, with enhanced and accurate detection techniques (mSphere. 2024 Jul 30;9(7):e0036224). More importantly, comprehensive efforts should be made to reduce antimicrobial usage overall and mitigate potential AMR transmission from environmental sources or other hosts (Vaccines (Basel). 2024 Sep 18;12(9):1067; Vaccines (Basel). 2023 Apr 18;11(4):865; Front Immunol. 2022 Aug 11:13:973224).”

      Author response image 6.

      A comparison of the global prevalence of S. gallinarum across continents.

      (9) The abstract mentions stepwise evolution as a main aim, but no results refer to this. 

      Thank you for raising this issue. In the revised manuscript, we have changed “stepwise evolution” to simply “evolution” to ensure a more accurate and precise description.

      (10) The authors attribute changes in population dynamics to normalisation in China-EU relations and hen fever. However, even if the date is correct, this is not a strongly supported causal claim, as many other reasons are also possible (for example other industrial processes which may have changed during this period). 

      Thank you for raising this critical issue. In the revised manuscript, we conducted a more stringent BEAST analysis for each lineage, as described earlier. This led to some changes in the inferred evolutionary timelines. Consequently, we have removed the corresponding statement from the “Results” section. Instead, we now only provide a discussion of historical events, supported by literature, that could have facilitated the intercontinental spread of L2b and L3b in the “Discussion” section. We believe these revisions have made the manuscript more rigorous and precise.

      Lines 332-342: “_The biovar types of _S. Gallinarum have been well-defined as bvSP, bvSG, and bvSD historically ( J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):2148). Among these, bvSP can be further subdivided into five lineages (L1, L2a, L2b, L3b, and L3c) using hierarchical Bayesian analysis. Different sublineages exhibited preferential geographic distribution, with L2b and L3b of bvSP being predominant global lineage types with a high risk of AMR. The historical geographical transmission was verified using a spatiotemporal Bayesian framework. The result shows that L3b was initially spread from China to Europe in the 18<sup>th</sup>-19<sup>th</sup> century, which may be associated with the European hen fever event in the mid-19th century (Burnham GP. 1855. The history of the hen fever: a humorous record). L2b, on the other hand, appears to have spread to Europe via South America, potentially contributing to the prevalence of bvSP in the United States.”  

      (11) No acknowledgment of potential undersampling outside of China is made, for example, 'Notably, all bvSP isolates from Asia were exclusively found in China, which can be manually divided into three distinct regions (southern, eastern, and northern).'.

      Perhaps we just haven't looked in other places?

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries with, a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains a challenging endeavour. 

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. In some high chicken-producing developing countries, such as China and Brazil, there are more frequent reports of fowl typhoid. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).  

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms. 

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we modified this sentence to indicate that this phenomenon is only observed in the current dataset, thereby avoiding an overly absolute statement:

      Lines 131-135: “For the bvSP strains from Asia included in our dataset, we found that all originated from China. To further investigate the distribution of bvSP across different regions in China, we categorized them into three distinct regions: southern, eastern, and northern (Supplementary Table 3)”.

      (12) Many of the conclusions are highly speculative and not supported by the data. 

      Thank you for your comment. We have carefully revised the manuscript to address your concerns. We hope that the changes made in the revised version meet your expectations and provide a clearer and more accurate interpretation of our findings.

      (13) The figures are not always the best presentation of the data: 

      a. Stacked bar plots in Figure 1 are hard to interpret, the total numbers need to be shown.

      Panel C conveys little information. 

      b. Figure 4B: stacked bars are hard to read and do not show totals. 

      c. Figure 5 has no obvious interpretation or significance. 

      Thank you for your comments. We have revised the figures to improve the clarity and presentation of the data.

      In summary, the quality of analysis is poor and likely flawed (although there is not always enough information on methods present to confidently assess this or provide recommendations for how it might be improved). So, the stated conclusions are not supported. 

      Thank you for your valuable feedback. We have carefully revised the manuscript to address your concerns. We hope that the updated figures and tables, and new data in the revised version meet your expectations and provide more appropriate interpretation of our findings.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      This reviewer enjoyed reading this well-written manuscript. The authors are encouraged to address the following comments and revise the manuscript accordingly. 

      (1) Title: The authors use avian-restrict Salmonella to refer to Salmonella Gallinarum. Please consider using Salmonella Gallinarum in the title. Also, your analysis relates to resistome and mobilome. Would it make sense to add mobilome in the manuscript? 

      Thank you for your guidance. In the revised manuscript, we have changed the title to “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction”. We believe that this revised title more accurately reflects the content of our study.

      (2) Abstract: This study uses 45 isolates from your labs. However, you failed to include these 45 isolates in the Abstract. Also, please clarify the sources of these isolates (from dead chickens, or dead chicken embryos? You wrote in two different ways in this manuscript). Also, I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work. 

      Thank you for your thorough review and constructive comments on the manuscript. In the revised version, we have added a description of 45 newly isolated S. Gallinarum strains in the Abstract to provide readers with a clearer understanding of the dataset used in this study.

      Lines 36-41: “Using the most comprehensive whole-genome sequencing dataset of Salmonella enterica serovar Gallinarum (S. Gallinarum) collected from 16 countries, including 45 newly recovered samples from two related local regions, we established the relationship among avian-specific pathogen genetic profiles and localization patterns.”

      Furthermore, the newly isolated S. Gallinarum strains were obtained from dead chicken embryos. We think your second concern may arise from the following description in the manuscript: “All 734 samples of dead chicken embryos were collected from Taishun and Yueqing in Zhejiang Province, China. After the thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.” In fact, all the collected dead chicken embryos were aged 19 to 20 days. At this developmental stage, collecting the liver, intestines, and spleen for isolation and cultivation of S. Gallinarum is possible. To avoid any confusion, we have included a more detailed description of the dead chicken embryos in the revised manuscript as follows:

      Lines 447-451: “All 734 samples of dead chicken embryos aged 19 to 20 days were collected from Taishun and Yueqing in Zhejiang Province, China. After a thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.”

      Regarding your concern about the statement, “I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work,” we would like to clarify the significance of these new isolates. Our research first identified distinct characteristics in the 45 newly isolated S. Gallinarum strains from Taishun and Yueqing, Zhejiang Province. Specifically, we found that most of the strains from Yueqing belonged to sequence type ST92, whereas the majority from Taishun were ST3717. Additionally, there were significant differences between these geographically close strains in terms of SNP distance and predicted invasion capabilities. These findings suggest that S. Gallinarum may exhibit localized transmission patterns, which forms the basis of the scientific question and hypothesis we originally aimed to address. Furthermore, in our previous work, we collected 325 S. Gallinarum strains. By incorporating the newly isolated 45 strains, we aim to provide a more comprehensive view of the population diversity, transmission pattern and potential risk of S. Gallinarum. We will continue to endeavour to understand the global genomic and population diversity in this field.

      Finally, we revised the sentences that could potentially raise concerns for readers: 

      Lines 175-177: “To investigate the dissemination pattern of bvSP in China, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”  >  “To investigate the dissemination pattern of bvSP, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”

      (3) The manuscript uses nomenclature and classification into different sublineages. Did the authors establish the approaches for defining these sublineages in this group or did you follow the accepted standards? 

      Thank you very much for raising this important issue. The biovar types of Salmonella Gallinarum have historically been well-defined as S. Gallinarum biovar

      Pullorum (bvSP), S. Gallinarum biovar Gallinarum (bvSG), and S. Gallinarum biovar Duisburg (bvSD) (J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):214-8). However, there seems to be no widespread consensus on the population nomenclature for the key biovar bvSP. In a previous study, Zhou et al. classified bvSP into six lineages:

      L1, L2a, L2b, L3a, L3b, and L3c (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). However, our more comprehensive analysis of S. Gallinarum using a larger dataset and hierarchical Bayesian clustering revealed that L3a, previously considered a distinct lineage, is actually a sublineage of L3c. Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      (4) This reviewer is convinced with the analysis approaches and conclusion of this work.

      In the meantime, the authors are encouraged to discuss the application of the conclusion of this study: a) can the data be somehow used in the prediction model? b) would the conclusion from S. Gallinarum have generalized application values for other pathogens. 

      Thank you for your constructive comments on the manuscript. 

      a) can the data be somehow used in the prediction model?

      We believe that genomic data can be effectively used for constructing prediction models; however, the success of such models largely depends on the specific traits being predicted. In this study, we utilized a random forest prediction model based on 196 top genes (PLoS Genet. 2018 May 8;14(5)) to predict the invasiveness of 45 newly isolated strains. In relation to the antimicrobial resistance (AMR) issue discussed in this paper, we also conducted relevant analyses. For instance, we explored the use of image-based models to predict whether a genome is resistant to specific antibiotics (Comput Struct Biotechnol J. 2023 Dec 29:23:559-565). We are confident that the incorporation of newly generated data will facilitate the development of future predictive models, and we plan to pursue further research in this area.

      b) would the conclusion from S. Gallinarum have generalized application values for other pathogens.

      This might be explained from two perspectives. First, the key role of the mobilome in facilitating the spread of the resistome, as emphasized in this study, has also been confirmed in research on other pathogens (mBio. 2024 Oct 16;15(10):e0242824). Thus, we believe that the pipeline we developed to assess the horizontal transfer frequency of different resistance genes across regions applies to various pathogens. On the other hand, due to distinct evolutionary histories, different pathogens exhibit varying levels of adaptation to their environments. In this study, we found that S. Gallinarum tends to spread highly localized; however, this conclusion may not necessarily hold for other pathogens.

      Reviewer #2 (Recommendations for the authors): 

      The authors would need to: 

      (1) Address my concerns about genomic analyses listed in the public review. 

      Thank you for your valuable feedback. We have carefully reviewed your concerns and made the necessary revisions to address the points raised about genomic analyses in the public review. We sincerely hope that these modifications meet your expectations and provide more robust analysis. We appreciate your thoughtful input and remain open to further suggestions to improve the manuscript.

      (2) Add more detail on the genomic methods and their outputs, as suggested above. 

      We have added further details to clarify the methodologies and outputs as mentioned above. Specifically, we expanded the description of the data processing, and the bioinformatic tools used for analysis. To ensure clarity, we also included an expanded discussion of the key outputs, highlighting their implications. We hope these revisions meet your expectations.

      (3) Critically rewrite their introduction to make it clear what problem they are trying to address. 

      Thank you for your guidance. In the revised manuscript, we have made the necessary modifications to the Introduction section to more clearly articulate the problem we aim to address.

      (4) Critically rewrite their conclusions so they are supported by the data they present, and make it clear when claims are more speculative. 

      Thank you for your guidance. In the revised manuscript, we have made the recommended modifications to the relevant sections of the conclusion as outlined above.

      More minor issues I identified: 

      (1) Typo in the title 'avian-restrict'. 

      Done.

      Line 1: “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction.”

      (2) 'By utilizing the pipeline we developed' -- a pipeline has not been introduced at this point. 

      In the revised manuscript, we have removed this section from the 'Abstract'.

      Lines 46-48: “Notably, the mobilome-resistome combination among distinct lineages exhibits a geographical-specific manner, further supporting a localized endemic mobilome-driven process.”

      (3) 'has more than 90% serovars' -- doesn't make sense. 

      Revised.

      Lines 82-83: “Salmonella, a pathogen with distinct geographical characteristics, has more than 90% of its serovars frequently categorized as geo-serotypes.”

      (4) 'horrific mortality rates that remain a disproportionate burden'. 

      Revised.

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica Serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (5) What is the rate, what is a comparison, how is it disproportionate? 

      Thank you for your valuable feedback. It is challenging to accurately estimate the specific prevalence of S. Gallinarum, particularly due to the lack of comprehensive data in many countries. Numerous cases likely go unreported. However, S. Gallinarum is more commonly detected in low- and middle-income countries. Here, we provide three evidence supporting this observation. First, in our previous research, we conducted a comprehensive meta-analysis of 201 studies, involving over 900 million samples, to evaluate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). The estimated prevalence in 17 countries showed that Bangladesh had the highest rate (25.75%) of S. Gallinarum infections. However, for biovar Pullorum (bvSP), Argentina (20.69%) and China (18.18%) reported the highest prevalence rates. Second, previous studies have also reported that S. Gallinarum predominantly occurs in low- and middleincome countries (Vet Microbiol. 2019 Jan:228:165-172; BMC Microbiol. 2024 Oct 18;24(1):414). Finally, S. Gallinarum was once a globally prevalent pathogen in the 20th century. Following the implementation of eradication programs in most high-income countries, it was listed by the World Organization for Animal Health and subsequently became an endemic pathogen with sporadic outbreaks. However, similar eradication efforts are challenging to implement in low- and middle-income countries, leading to a disproportionately higher incidence of S. Gallinarum in these regions.

      In the revised manuscript, we have rephrased this sentence to enhance its accuracy:

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (6) 'we collected the most comprehensive set of 580 S. Gallinarum isolates', -> 'we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes'. 

      Revised.

      Lines 97-100: “To fill the gaps in understanding the evolution of S. Gallinarum under regional-associated AMR pressures and its adaptation to endemicity, we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes, spanning the period from 1920 to 2023.” 

      (7) Sequence reads are not available, and use a non-standard database. The eLife policy states: 'Sequence reads and assembly must be included for reference genomes, while novel short sequences, including epitopes, functional domains, genetic markers and haplotypes should be deposited, together with surrounding sequences, into Genbank, DNA Data Bank of Japan (DDBJ), or EMBL Nucleotide Sequence Database (ENA). DNA and RNA sequencing data should be deposited in NCBI Trace Archive or NCBI Sequence Read Archive (SRA).' So the sequences assemblies and reads should ideally be mirrored appropriately. 

      Thank you for your valuable suggestion regarding submitting the genome data for the newly isolated 45 S. Gallinarum strains. The genome data have been deposited in the NCBI Sequence Read Archive (SRA) under two BioProjects. The “SRA Accession number” for each strain have been added to New Supplementary Table 1. We believe this will ensure that the data are more readily accessible to a broader audience of researchers for download and analysis. We have revised the corresponding paragraph in the manuscript as follows:

      Lines 606-608: “For the newly isolated 45 strains of Salmonella Gallinarum, genome data have been deposited in NCBI Sequence Read Archive (SRA) database. The “SRA Accession” for each strain are listed in Supplementary Table 1.”

      (8) You should state at the start of the results which data is public, and how much is newly sequenced. 

      Revised.

      Lines 109-112: “To understand the global geographic distribution and genetic relationships of S. Gallinarum, we assembled the most comprehensive S. Gallinarum WGS dataset (n=580), comprising 535 publicly available genomes and 45 newly sequenced genomes.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Cell metabolism exhibits a well-known behavior in fast-growing cells, which employ seemingly wasteful fermentation to generate energy even in the presence of sufficient environmental oxygen. This phenomenon is known as Overflow Metabolism or the Warburg effect in cancer. It is present in a wide range of organisms, from bacteria and fungi to mammalian cells.

      In this work, starting with a metabolic network for Escherichia coli based on sets of carbon sources, and using a corresponding coarse-grained model, the author applies some well-based approximations from the literature and algebraic manipulations. These are used to successfully explain the origins of Overflow Metabolism, both qualitatively and quantitatively, by comparing the results with E. coli experimental data.

      By modeling the proteome energy efficiencies for respiration and fermentation, the study shows that these parameters are dependent on the carbon source quality constants K_i (p.115 and 116). It is demonstrated that as the environment becomes richer, the optimal solution for proteome energy efficiency shifts from respiration to fermentation. This shift occurs at a critical parameter value K_A(C).

      This counterintuitive result qualitatively explains Overflow Metabolism.

      Quantitative agreement is achieved through the analysis of the heterogeneity of the metabolic status within a cell population. By introducing heterogeneity, the critical growth rate is assumed to follow a Gaussian distribution over the cell population, resulting in accordance with experimental data for E. coli. Overflow metabolism is explained by considering optimal protein allocation and cell heterogeneity.

      The obtained model is extensively tested through perturbations: 1) Introduction of overexpression of useless proteins; 2) Studying energy dissipation; 3) Analysis of the impact of translation inhibition with different sub-lethal doses of chloramphenicol on Escherichia coli; 4) Alteration of nutrient categories of carbon sources using pyruvate. All model perturbation results are corroborated by E. coli experimental results.

      We appreciate the reviewer's highly positive comments and the accurate summary of our manuscript.

      Strengths:

      In this work, the author employs modeling methods typical of Physics to address a problem in Biology, standing at the interface between these two scientific fields. This interdisciplinary approach proves to be highly fruitful and should be further explored in the literature. The use of Escherichia coli as an example ensures that all hypotheses and approximations in this study are well-founded in the literature. Examples include the approximation for the Michaelis-Menten equation (line 82), Eq. S1, proteome partition in Appendix 1.1 (lines 68-69), and a stable nutrient environment in Appendix 1.1 (lines 83-84). The section "Testing the model through perturbation" heavily relies on bacterial data. The construction of the model and its agreement with experimental data are convincingly presented.

      We appreciate the reviewer's highly positive comments. We have incorporated many of the reviewer's insightful suggestions and added citations in the appropriate contexts, which have significantly improved our manuscript.

      Weaknesses:

      In Section Appendix 6.4, the author explores the generalization of results from bacteria to cancer cells, adapting the metabolic network and coarse-grained model accordingly. It is argued that as a consequence, all subsequent steps become immediately valid. However, I remain unconvinced, considering the numerous approximations used to derive the equations, which the literature demonstrates to be valid primarily for bacteria. A more detailed discussion about this generalization is recommended. Additionally, it is crucial to note that the experimental validation of model perturbations heavily relies on E. coli data.

      We appreciate the reviewer's insightful suggestions. We apologize for not clearly illustrating the generalization of results from bacteria to cancer cells in the previous version of our manuscript. Indeed, in our earlier version, there was no experimental validation of model results related to cancer cells.

      Following the reviewer’s suggestions, we have now added Fig. 5 and Appendix-fig. 5, fully expanded the previous Appendix 6.4 into Appendix 9 in our current version, and added a new section entitled “Explanation of the Crabtree effect in yeast and the Warburg effect in cancer cells” in our main text to provide a detailed discussion of the generalization from bacteria to yeast and cancer cells. Through the derivations shown in Appendix 9 (Eqs. S180-S189), we arrived at Eq. 6 (or Eq. S190 in Appendix 9) to facilitate the comparison of our model results with experimental data in yeast and cancer cells. This comparison is presented in Fig. 5, where we demonstrate that our model can quantitatively explain the data for the Crabtree effect in yeast and the Warburg effect in cancer cells (related experimental data references: Shen et al., Nature Chemical Biology 20, 1123–1132 (2024); Bartman et al., Nature 614, 349-357 (2023)). These additions have significantly strengthened our manuscript.

      Reviewer #2 (Public Review):

      Summary

      This paper has three parts. The first part applied a coarse-grained model with proteome partition to calculate cell growth under respiration and fermentation modes. The second part considered single-cell variability and performed population average to acquire an ensemble metabolic profile for acetate fermentation. The third part used model and simulation to compare experimental data in literature and obtained substantial consistency.

      We thank the reviewer for the accurate summary and positive comments on our manuscript.

      Strengths and major contributions

      (i) The coarse-grained model considered specific metabolite groups and their interrelations and acquired an analytical solution for this scenario. The "resolution" of this model is in between the Flux Balanced Analysis/whole-cell simulation and proteome partition analysis.

      (ii) The author considered single-cell level metabolic heterogeneity and calculated the ensemble average with explicit calculation. The results are consistent with known fermentation and growth phenomena qualitatively and can be quantitatively compared to experimental results.

      We appreciate the reviewer’s highly positive comments.

      Weaknesses

      (i) If I am reading this paper correctly, the author's model predicts binary (or "digital") outcomes of single-cell metabolism, that is, after growth rate optimization, each cell will adopt either "respiration mode" or "fermentation mode" (as illustrated in Figure Appendix - Figure 1 C, D). Due to variability enzyme activity k_i^{cat} and critical growth rate λ_C, each cell under the same nutrient condition could have either respiration or fermentation, but the choice is binary.

      The binary choice at the single-cell level is inconsistent with our current understanding of metabolism. If a cell only uses fermentation mode (as shown in Appendix - Figure 1C), it could generate enough energy but not be able to have enough metabolic fluxes to feed into the TCA cycle. That is, under pure fermentation mode, the cell cannot expand the pool of TCA cycle metabolites and hence cannot grow.

      This caveat also appears in the model in Appendix (S25) that assumes J_E = r_E*J_{BM} where r_E is a constant. From my understanding, r_E can be different between respiration and fermentation modes (at least for real cells) and hence it is inappropriate to conclude that cells using fermentation, which generates enough energy, can also generate a balanced biomass.

      We thank the reviewer for raising this question. Indeed, regarding energy biogenesis between respiration and fermentation, our model predicts binary outcomes at the single-cell level. However, this outcome does not hinder cell growth, as there are three independent possible fates for the carbon source (e.g., glucose) in metabolism: fermentation, respiration for energy biogenesis, and biomass generation. Each fate is associated with a distinct fraction of the proteome, with no overlap between them (see Appendix-figs. 1 and 5). Consequently, in a purely fermentative mode, a cell can still use the proteome dedicated to the biomass generation pathway to produce biomass precursors via the TCA cycle.

      The classification of the carbon source’s fates into three independent pathways was initially introduced by Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)). We apologize for the oversight in not citing their paper in this context in the previous version of our manuscript (although it was cited elsewhere). We have now included the citation in all appropriate places.

      To illustrate this issue more clearly, we explicitly present the proteome allocation results for optimal growth in a fermentation mode below, where the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in fermentation is higher than in respiration (i.e., ). We use the model shown in Fig. 1B as an example, with the relevant equations being Eqs. S26 and S28 in Appendix 2.1. By substituting Eq. S28 into Eq. S26, we arrive at Eq. 3 (or Eq. S29 in Appendix 2.1), which we restate here as Eq. R1:

      For a given nutrient condition, i.e., for a specific value of κ<sub>A</sub> at the single-cell level, the values of are determined (see Eqs. S20, S27, S31 and S32), while  ϕ and φ<sub>max</sub> are constants (see Eq. S33 and Appendix 1.1). Therefore, if , then , since all coefficients are positive (i.e., ) and takes non-negative values. Hence, the solution for optimal growth is (see Eqs. S35-S36 in Appendix 2.2):

      Here, the result signifies a pure fermentation mode with no respiration flux for energy biogenesis. Then, by combining Eq. R2 with Eqs. S28 and S30 from Appendix 2.1, we obtain the optimal proteome allocation results for this case:

      where , while κ<sub>A</sub> and take given values (see Eqs. S20 and S27). In Eq. R3, φ<sub>3</sub> corresponds to the fraction of the proteome devoted to carrying the carbon flux from Acetyl-CoA (the entry point of Pool b, see Fig. 1B and Appendix 1.2) to α-Ketoglutarate (the entry point of Pool c), with all of these being enzymes within the TCA cycle. The optimal growth solution is , which demonstrates that in a pure fermentation mode, the optimal growth condition includes the presence of enzymes within the TCA cycle capable of carrying the flux required for biomass generation.

      Regarding Eq. S25, J<sub>E</sub> represents the energy demand for cell proliferation, expressed as the stoichiometric energy flux in ATP. Although the influx of carbon sources (e.g., glucose) varies significantly between fermentation and respiration modes, J<sub>BM</sub> and J<sub>E</sub>  are the biomass and energy fluxes used to build cells, respectively. In bacteria, whether in fermentation or respiration mode, the proportion of maintenance energy used for protein degradation is roughly negligible (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Consequently, the energy demand represented by J_E scales approximately linearly with the biomass production rate _J<sub>BM</sub> (related experimental data reference: Ebenhöh et al., Life 14, 247 (2024)), regardless of the energy biogenesis mode. Therefore, _r_E can be regarded as roughly constant for bacteria. However, in eukaryotic cells such as yeast and mammalian cells, the proportion of maintenance energy is much more significant (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Therefore, we have explicitly considered the contribution of maintenance energy in these cases and have extended the previous Appendix 6.4 into Appendix 9 in the current version.

      (ii) The minor weakness of this model is that it assumes a priori that each cell chooses its metabolic strategy based on energy efficiency. This is an interesting assumption but there is no known biochemical pathway that directly executes this mechanism. In evolution, growth rate is more frequently considered for metabolic optimization. In Flux Balanced Analysis, one could have multiple objective functions including biomass synthesis, energy generation, entropy production, etc. Therefore, the author would need to justify this assumption and propose a reasonable biochemical mechanism for cells to sense and regulate their energy efficiency.

      We thank the reviewer for raising this question and apologize for not explaining this point clearly enough in the previous version of our manuscript. Just as the reviewer mentioned, growth rate should be considered for metabolic optimization under the selection pressure of the evolutionary process. In fact, in our model, the sole optimization objective is exactly the cell growth rate. The determination of whether to use fermentation or respiration based on proteome efficiency (i.e., the proteome energy efficiency in our previous version) is not an a priori assumption in our model; rather, it is a natural consequence of growth rate optimization, as we detail below. 

      For a given nutrient condition with a determined value of κ<sub>A</sub> , as we have explained in the aforementioned responses, the constraint on the fluxes is summarized in Eq. 3 and is restated as Eq. R1. Mathematically, we can obtain the solution for the optimal growth strategy by combining Eq. R1 (i.e., Eq. 3) with the optimization on cell growth rate λ, and the solution can be obtained as follows: If the proteome efficiency in fermentation is larger than that in respiration, i.e., , then from Eq. R1, we obtain , since the values of ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ and φ<sub>max</sub> are all fixed for a given κ_A_ , with ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ, φ<sub>max</sub> > 0 . Hence, (since ), and note that . Therefore is the solution for optimal growth, where the growth rate can take the maximum value of . Similarly, for the case where the proteome efficiency in respiration is larger than that in fermentation (i.e ), is the solution for optimal growth. With this analysis, we have demonstrated that the choice between fermentation and respiration based on proteome efficiency is a natural consequence of growth rate optimization.

      We have now revised the related content in our manuscript to clarify this point.

      My feeling is that the mathematical structure of this model could be correct, but the single-cell interpretation for the ensemble averaging has issues. Each cell could potentially adopt partial respiration and partial fermentation at the same time and have temporal variability in its metabolic mode as well. With the modification of the optimization scheme, the author could have a revised model that avoids the caveat mentioned above.

      We thank the reviewer for raising this question. In fact, in the above two responses, we have addressed the issues raised here, clarifying that the binary mode between respiration and fermentation does not hinder cell growth and that the sole optimization objective is the cell growth rate, as the reviewer suggested. Regarding temporal variability, due to factors such as cell cycle stages and the intrinsic noise arising from stochastic processes, temporal variability in the fermentation or respiration mode is indeed likely. However, at any given moment at the single-cell level, a binary choice between fermentation and respiration is what our model predicts for the optimal growth strategy. 

      Discussion and impact for the field

      Proteome partition models and Flux Balanced Analysis are both commonly used mathematical models that emphasize different parts of cellular physiology. This paper has ingredients for both, and I expect after revision it will bridge our understanding of the whole cell.

      We appreciate the reviewer’s very positive comments. We have followed many of the good suggestions raised by the reviewer, and our revised manuscript is much improved as a result.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript "Overflow metabolism originates from growth optimization and cell heterogeneity" the author Xin Wang investigates the hypothesis that the transition into overflow metabolism at large growth rates actually results from an inhomogeneous cell population, in which every individual cell either performs respiration or fermentation.

      We thank the reviewer for carefully reading our manuscript and the accurate summary.

      Weaknesses:

      The paper has several major flaws. First, and most importantly, it repeatedly and wrongly claims that the origins of overflow metabolism are not known. The paper is written as if it is the first to study overflow metabolism and provide a sound explanation for the experimental observations. This is obviously not true and the author actually cites many papers in which explanations of overflow metabolism are suggested (see e.g. Basan et al. 2015, which even has the title "Overflow metabolism in E. coli results from efficient proteome allocation"). The paper should be rewritten in a more modest and scientific style, not attempting to make claims of novelty that are not supported. In fact, all hypotheses in this paper are old. Also the possiblility that cell heterogeneity explains the observed 'smooth' transition into overflow metabolism has been extensively investigated previously (see de Groot et al. 2023, PNAS, "Effective bet-hedging through growth rate dependent stability") and the random drawing of kcat-values is an established technique (Beg et al., 2007, PNAS, "Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity"). Thus, in terms of novelty, this paper is very limited. It reinvents the wheel and it is written as if decades of literature debating overflow metabolism did not exist.

      We thank the reviewer for both the critical and constructive comments. Following the reviewer’s suggestion, we have revised our manuscript to adopt a more modest style. However, we respectfully disagree with the criticism regarding the novelty of our study, as detailed below.

      First, while many explanations for overflow metabolism have been proposed, we have cited these in both the previous and current versions of our manuscript. We apologize for not emphasizing the distinctions between these previous explanations and our study in the main text of our earlier version, though we did provide details in Appendix 6.3. In fact, most of these explanations (e.g., Basan et al., Nature 528, 99-104 (2015); Chen and Nielsen, PNAS 116, 17592-17597 (2019); Majewski and Domach, Biotechnol. Bioeng. 35, 732-738 (1990); Niebel et al., Nat. Metab. 1, 125-132 (2019); Shlomi et al., PLoS Comput. Biol. 7, e1002018 (2011); Varma and Palsson, Appl. Environ. Microbiol. 60, 3724-3731 (1994); Vazquez et al., BMC Syst. Biol. 4, 58 (2010); Vazquez and Oltvai, Sci. Rep. 6, 31007 (2016); Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)) heavily rely on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or certain equivalents) to explain the growth rate dependence of fermentation flux. However, this assumption—that cell growth rate is optimized for a given rate of carbon influx—is questionable, as the given factors in a nutrient condition are the identity and concentration of the carbon source, rather than the carbon influx itself.

      Consequently, in our model, we purely optimize cell growth rate without imposing a special constraint on carbon influx. Our assumption that the given factors in a nutrient condition are the identity and concentration of the carbon source aligns with the studies by Molenaar et al. (Molenaar et al., Mol. Syst. Biol. 5, 323 (2009)), where they specified an identical assumption on page 5 of their Supplementary Information (SI); Scott et al. (Scott et al., Science 330, 1099-1102 (2010)), where the growth rate formula was derived for a culturing condition with a given nutrient quality; and Wang et al. (Wang et al., Nat. Comm. 10, 1279 (2019)), our previous study on microbial growth. Among these three studies, only Molenaar et al. addresses overflow metabolism. However, Molenaar et al. did not consider cell heterogeneity, resulting in their model predictions on the growth rate dependence of fermentation flux being a digital response, which is inconsistent with experimental data.

      Furthermore, prevalent explanations such as those by Basan et al. (Basan et al., Nature 528, 99-104 (2015)) and Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)) suggest that overflow metabolism originates from the proteome efficiency in fermentation always being higher than in respiration. However, Shen et al. (Shen et al., Nature Chemical Biology 20, 1123–1132 (2024)) recently discovered that the proteome efficiency measured at the cell population level in respiration is higher than in fermentation for many yeast and cancer cells, despite the presence of fermentation fluxes through aerobic glycolysis. This finding clearly contradicts the studies by Basan et al. (2015) and Chen and Nielsen (2019). 

      Nevertheless, our model may resolve this puzzle by incorporating two important features. First, in our model, the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in respiration is larger than that in fermentation when nutrient quality is low (Eqs. S174-S175 in Appendix 9). Second, and crucially, due to the incorporation of cell heterogeneity in our model, there could be a proportion of cells with higher proteome efficiency in fermentation than in respiration, even when the overall proteome efficiency at the cell population level is higher in respiration than in fermentation. As shown in the newly added Fig. 5A-B, our model results can quantitatively illustrate the experimental data from Shen et al., Nature Chemical Biology 20, 1123–1132 (2024).

      Finally, regarding the criticism of the novelty of our hypothesis: As specified in our main text, cell heterogeneity has been widely reported experimentally in both microbes (e.g., Ackermann, Nat. Rev. Microbiol. 13, 497-508 (2015); Bagamery et al., Curr. Biol. 30, 4563-4578 (2020); Balaban et al., Science 305, 1622-1625 (2004); Nikolic et al., BMC Microbiol. 13, 1-13 (2013); Solopova et al., PNAS 111, 7427-7432 (2014); Wallden et al., Cell 166, 729-739 (2016)) and tumor cells (e.g., Duraj et al., Cells 10, 202 (2021); Hanahan and Weinberg, Cell 164, 681-694 (2011); Hensley et al., Cell 164, 681-694 (2016)). However, to the best of our knowledge, cell heterogeneity has not yet been incorporated into theoretical models for explaining overflow metabolism or the Warburg effect. The reviewer mentioned the study by de Groot et al. (de Groot et al., PNAS 120, e2211091120 (2023)) as studying overflow metabolism similarly to our work. We have carefully read this paper, including the main text and SI, and found that it is not directly relevant to either overflow metabolism or the Warburg effect. Instead, their model extends the work of Kussell and Leibler (Kussell and Leibler, Science 309, 2075-2078 (2005)), focusing on bet-hedging strategies of microbes in changing environments.

      Regarding the criticism that random drawing of kcat-values is an established technique (Beg et al., PNAS 104, 12663-12668 (2007)), we need to stress that the distribution noise on kcat-values considered in our model is fundamentally different from that in Beg et al. In Beg et al., their model involved 876 reactions (see Dataset 1 in Beg et al.), of which only 109 had associated biochemical experimental data. Thus, their distribution of kcat-values pertains to different enzymes within the same cell. In contrast, we have the mean of the kcat-values from experimental data for each relevant enzymes, with the distribution of kcat-values representing the same enzyme in different cells.           

      Moreover, the manuscript is not clearly written and is hard to understand. Variables are not properly introduced (the M-pools need to be discussed, fluxes (J_E), "energy coefficients" (eta_E), etc. need to be more explicitly explained. What is "flux balance at each intermediate node"? How is the "proteome efficiency" of a pathway defined? The paper continues to speak of energy production. This should be avoided. Energy is conserved (1st law of thermodynamics) and can never be produced. A scientific paper should strive for scientific correctness, including precise choice of words.

      We thank the reviewer for the constructive comments. Following these, we have provided more explicit information and revised our manuscript to enhance readability. In our initially submitted version, the phrase "energy production" was borrowed from Nelson et al. (Nelson et al., Lehninger principles of biochemistry, 2008) and Basan et al. (Basan et al., Nature 528, 99-104 (2015)), and we chose to follow this terminology. We appreciate the reviewer’s suggestion and have now revised the wording to use more appropriate expressions.

      The statement that the "energy production rate ... is proportional to the growth rate" is, apart from being incorrect - it should be 'ATP consumption rate' or similar (see above), a non-trivial claim. Why should this be the case? Such statements must be supported by references. The observation that the catabolic power indeed appears to increase linearly with growth rate was made, based on chemostat data for E.coli and yeast, in a recent preprint (Ebenhöh et al, 2023, bioRxiv, "Microbial pathway thermodynamics: structural models unveil anabolic and catabolic processes").

      We thank the reviewer for the insightful suggestions. Following these, we have revised our manuscript and cited the suggested reference (i.e., Ebenhöh et al., Life 14, 247 (2024)).

      All this criticism does not preclude the possibility that cell heterogeneity plays a role in overflow metabolism. However, according to Occam's razor, first the simpler explanations should be explored and refuted before coming up with a more complex solution. Here, it means that the authors first should argue why simpler explanations (e.g. the 'Membrane Real Estate Hypothesis', Szenk et al., 2017, Cell Systems; maximal Gibbs free energy dissipation, Niebel et al., 2019, Nature Metabolism; Saadat et al., 2020, Entropy) are not considered, resp. in what way they are in disagreement with observations, and then provide some evidence of the proposed cell heterogeneity (are there single-cell transcriptomic data supporting the claim?).

      We thank the reviewer for raising these questions and providing valuable insights. Regarding the shortcomings of simpler explanations, as explained above, most proposed explanations (including the references mentioned by the reviewer: Szenk et al., Cell Syst. 5, 95-104 (2017); Niebel et al., Nat. Metab. 1, 125-132 (2019); Saadat et al., Entropy 22, 277 (2020)) rely heavily on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or its equivalents). However, this assumption is questionable, as the given factors in a nutrient condition are the identities and concentrations of the carbon sources, rather than the carbon influx itself.

      Specifically, Szenk et al. is a perspective paper, and the original “membrane real estate hypothesis” was proposed by Zhuang et al. (Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)). Zhuang et al. specified in Section 7 of their SI that their model’s explanation of the experimental results shown in Fig. 2C of their manuscript relies on the assumption of restrictions on carbon influx. In Niebel et al. (Niebel et al., Nat. Metab. 1, 125-132 (2019)), the Methods section specifies that the glucose uptake rate was considered a given factor for a growth condition. In Saadat et al. (Saadat et al., Entropy 22, 277 (2020)), Appendix A notes that their model results depend on minimizing carbon influx for a given growth rate, which is equivalent to the assumption mentioned above (see Appendix 6.3 in our manuscript for details). 

      Regarding the experimental evidence for our proposed cell heterogeneity, Bagamery et al. (Bagamery et al., Curr. Biol. 30, 4563-4578 (2020)) reported non-genetic heterogeneity in two subpopulations of Saccharomyces cerevisiae cells upon the withdrawal of glucose from exponentially growing cells. This strongly indicates the coexistence of fermentative and respiratory modes of heterogeneity in S. cerevisiae cultured in a glucose medium (refer to Fig. 1E in Bagamery et al.). Nikolic et al. (Nikolic et al., BMC Microbiol. 13, 1-13 (2013)) reported a bimodal distribution in the expression of the acs gene (the transporter for acetate) in an E. coli cell population growing on glucose as the sole carbon source within the region of overflow metabolism (see Fig. 5 in Nikolic et al.), indicating the cell heterogeneity we propose. For cancer cells, Duraj et al. (Duraj et al., Cells 10, 202 (2021)) reported a high level of intra-tumor heterogeneity in glioblastoma using optical microscopy images, where 48%~75% of the cells use fermentation and the remainder use respiration (see Fig. 1C in Duraj et al.), which aligns with the cell heterogeneity picture of aerobic glycolysis predicted by our model.   

      We have now added related content to the discussion section to strengthen our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Some minor corrections:

      (1) Adjusted the reference: (García-Contreras et al., 2012)

      (2) Corrected line 255: Removed the duplicate "the genes"

      We thank the reviewer for the suggestions and have implemented each of them to revise our manuscript. The reference in the form of García-Contreras et al., 2012, although somewhat unusual, is actually correct, so we have kept it unchanged.

      General comment to the author:

      Considering that this work exists at the interface between Physics and Biology, where a significant portion of the audience may not be familiar with the mathematical manipulations performed, it would enhance the paper's readability to provide more explicit indications in the text. For example, in line 91, explicitly define phi_A as phi_R; or in line 115, explain the K_i parameter in the text for better readability.

      We thank the reviewer for the suggestion. Following this, we have now provided more explicit information for the definition of mathematical symbols to enhance readability.

      Reviewer #2 (Recommendations For The Authors):

      The current form of this manuscript is difficult to read for general readers. In addition, the model description in the Appendix can be improved for biophysics readers to keep track of the variables. Here are my suggestions:

      a) In the main text, the author should give the definition of "proteome energy efficiency" explicitly both in English and mathematical formula - since this is the central concept of the paper. The biological interpretation of formula (4) should also be stated.

      We thank the reviewer for the suggestion. Following this, we have now added definitions and biological interpretations to fix these issues.

      b) I feel the basic model of the reaction network in the Appendix could be stated in a more concise way, by emphasizing whether a variable is extensive (exponential growing) or intensive (scale-invariant under exponential growth).

      From my understanding, this work assumes balanced exponential growth and hence there is a balanced biomass vector Y* (a constant unit vector with all components sum to 1) for each cell. The steady-state fluxes {J} are extensive and all have growth rate λ. The proteome partition and relative metabolite fractions are ratios of different components of Y* and hence are intensive.

      The normalized fluxes {J^(n)} (with respect to biomass) are a function of Y* and are all kept as constant ratios with each other. They are also intensive.

      The biomass and energy production are linear combinations of {J} and hence are extensive and follow exponential growth. The biomass and energy efficiency are ratios between flux and proteome biomass, and hence are intensive.

      We thank the reviewer for the insightful suggestion. Following this, we have now added the intensive and extensive information for all relevant variables in the newly added Appendix-table 3.

      c) In the Appendix, the author should have a table or list of important variables, with their definition, units, and physiological values under respiration and fermentation.

      We thank the reviewer for the very useful suggestion. Following this, we have now added Appendix-table 3 (pages 54-57 in the appendices) to illustrate the symbols used throughout our manuscript, as well as the model variables and parameter settings.   

      d) Regarding the single-cell variability, the author ignored recent experimental measurements on single-cell metabolism. This includes variability on ATP, NAD(P)H in E. coli, which will be useful background for the readers, see below.

      https://pubmed.ncbi.nlm.nih.gov/25283467/

      https://pubmed.ncbi.nlm.nih.gov/29391569/

      We thank the reviewer for the very useful suggestion. We have now cited these relevant studies in our manuscript.  

      e) The choice between 100% respiration and 100% fermentation is based on the optimization of proteome energy efficiency, while the intermediate strategies are not favored in this model. This is similar to a concept in control theory called the bang-bang principle. This can be added to the Discussion.

      We thank the reviewer for this suggestion. We have reviewed the concept and articles on the bang-bang principle. While the bang-bang principle is indeed relevant to binary choices, it is somewhat distant from the topic of metabolic strategies related to optimal growth. The elementary flux mode (see Müller et al., J. Theor. Biol. 347, 182190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) is more pertinent to this topic, as it may lead to diauxic microbial growth (another binary metabolic strategy) in microbes grown on a mixture of two carbon sources from Group A (see Wang et al., Nat. Comm. 10, 1279 (2019)). Therefore, we have cited and mentioned only the elementary flux mode (Müller et al., J. Theor. Biol. 347, 182-190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) in the introduction and discussion sections of our manuscript.

    1. Author response:

      (1) General Statements

      We thank all three reviewers for their constructive comments and suggestions. We also thank reviewers #2 and #3 for considering our work to be timely and of interest to the field, not only for basic researchers, but also for translational scientists and industry. We are now providing additional results to further support our hypothesis and hope that all reviewers will find that our manuscript is now ready for publication. 

      (2) Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): 

      The manuscript by Coquel et al. investigates the effects of BKC and IBC, two compounds found in Psoralea corylifolia in DNA replication and the response to DNA damage, and explores their potential use in cancer treatment. These compounds have been previously shown to affect different cellular pathways and the authors use transformed cancer cells of different origins and a non-transformed cell line to question if their combination is toxic in cancer versus non-cancer cells. They propose that BKC inhibits DNA polymerases while IBC targets CHK2. Their results show that both compounds do affect DNA replication, inducing replication stress and affecting double strand break repair. They also show that their combined use increases their toxicity in a synergistic manner. 

      However, there are some major conclusions that are still not very well supported by the data: first, the differential effect on cancer and non-transformed cells; second, the direct link of BKC to the inhibition of DNA polymerases; and third, it is unclear if CHK2 is the relevant target for IBC in this context. 

      Regarding these points the authors should address the following issues: 

      (1) Most of the experiments use BJ fibroblasts as a control cell line. In order to evaluate if these compounds are preferentially toxic for cancer cells, the use of more than one non-transformed cell line is necessary. In addition, BJ cells are fibroblasts while most of the cancer cell lines employed are of epithelial origin. The authors could use MCF10 and RPE cells (both of epithelial origin) as control cell lines to complement the results and better support this claim. 

      We have now monitored the effect of IBC and BKC on the proliferation of MCF-7, MCF-10A and RPE-1 cells using the WST-1 assay and obtained similar results as for BJ and MCF-7 cells. These results are now included in the revised manuscript as Fig. S1A and S1B.

      (2) In order to explore what are the targets of BKC and IBC Cellular Thermal Shift Assays (CETSA) could be used. Either by doing an unbiased mass spectrometry analysis of proteins stabilized by these compounds or by a direct analysis of candidate proteins by western blot (a similar approach has been used for IBC to show that it inhibits SIRT2 in Ren et al., 2024 Phytotherapy Res).

      We thank this Reviewer for suggesting the use of the CETSA assay. We have now performed  CETSA on MCF-7 cells and found that IBC stabilizes CHK2 but not CHK1, to the same extent as the commercial CHK2 inhibitor BML-277 used here as a positive control. These results are now shown in new Fig. 4G and 4H.

      (3) For BKC in vitro polymerase assays could be carried out to show the direct inhibition of the DNA polymerase delta, for instance. 

      We have used high-speed Xenopus egg extracts to replicate ssDNA in vitro (Fig. S2C). This assay differs from the in vitro replication assay using low-speed Xenopus egg extracts (Fig. 2H) in that it only monitors elongation by replicative DNA polymerases (Pol δ and ε) and not earlier steps such as origin licensing and activation. The combined use of both low-speed and highspeed extracts strongly supports the view that BKC inhibits replicative DNA polymerases. 

      To confirm this result, we have also used CETSA to monitor BKC binding to different subunits of DNA Polδ and Polε in MCF-7 cells and in Xenopus egg extracts (Fig. 3C-D Fig. S3). We found that BKC binds POLD1 and POLE, the catalytic subunits of Pol δ and ε respectively, but not the accessory subunit POLD3 nor PCNA. Together with our docking results and DNA fiber experiments, these data strongly support the view that BKC is a potent inhibitor of DNA Pol and Pol. 

      (4) In addition, the authors could analyze the integrity of replication forks by PCNA immunofluorescence analysis. The colocalization of PCNA and POLD or POLE subunits could also support the role of DNA polymerases as targets of BKC. 

      Our molecular docking results also show that BKC occupies the catalytic sites of DNA Pol δ and ε, which may not affect their subcellular localization and/or PCNA binding. Since our DNA replication assays, CETSA and DNA fiber analyses strongly support the view that BKC inhibits replicative DNA polymerases, we have not performed this additional experiment.

      (5) In the case of IBC and the inhibition of CHK2, the authors should check the effect of IBC on the phosphorylation of BRCA1 on S988. The changes in CHK2 phosphorylation in Figure 3B are not convincing. The experiment should be repeated and the average of at least three experiments needs to be quantified. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation on S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and Fig. S4H. Densitometric quantification of CHK2 phosphorylation on S516 from 3 biological replicates, along with statistical analysis, is now shown in Fig. S4G.

      (6) To prove that CHK2 is the relevant target for IBC the authors could test if ATM and CHK2 knockout cells are more resistant to this compound, since it would prevent the phosphorylation of CHK2. 

      We have performed siRNA transfection targeting CHK2. The transfected cells died after 72 hours in culture, so we have been unable to determine whether CHK2-KD cells have increased resistance to IBC.  

      In addition to these experiments, I would suggest some other major improvements in the manuscript: 

      (1) The concentration of both compounds should be provided in molar units throughout the paper.

      Thanks for pointing this out, we now use molar units throughout the paper.

      (2) The authors do not clearly indicate the concentration that is employed in the different experiments, making it difficult to assess the results. For instance, Figure 2 does not include the concentration in the legend or in the text. Time and concentration need to be clearly shown for each experiment. 

      The experimental conditions and inhibitor concentrations are now clearly indicated for each experiment.

      (3) Some experiments are only repeated once (fiber assays) or twice (cell cycle analysis by flow cytometry). These experiments need to be repeated 3 times and the proper statistical analysis performed (comparison of the medians). 

      Superplots with biological replicates for all DNA fiber assays are now displayed. The number of biological replicates is now indicated in the legends and appropriate statistical analyses are used.

      Other minor points or suggestions: 

      (1) Analyzing fork asymmetry would further support the direct effect of BKC on DNA polymerases. 

      The effect of BKC on fork asymmetry is now shown in Fig. 2F. 

      (2) A dose dependent analysis of BKC on the speed of DNA replication would also support this point. 

      Superplots of DNA fiber assays showing the effect of different concentrations of BKC on fork speed from three biological replicates are now included in Fig. 2E.

      (3) Page 7: BKC reduces fork speed ...two-fold. This sentence is not very clear, it would be better to say that speed is half of the control. 

      This sentence was changed to “BKC reduced fork speed by a factor of two relative to untreated cells”.

      (4) Figure 4G and S4D show contradictory results regarding the induction of Rad51 foci by IBC treatment. This needs to be clarified. 

      Figure 4G and S4D (now Fig. 5G and S5D) do not show contradictory results. In both cases, IBC treatment impaired the induction of RAD51 foci by IR or bleomycin.  

      (5) Page 12, Figure S5C is called for but it does not exist (probably meaning Figure S5B). 

      We apologize for this error, which has now been corrected.  

      Reviewer #1 (Significance): 

      The work by Coquel et al. aims at elucidating the use of BKC and IBC as a combined therapy to induce cell death in cancer cells by targeting DNA replication and CHK2. Both BKC and IBC have been previously shown to affect the proliferation of cancer cells. BKC has been shown to induce S phase arrest in an ATR dependent manner in MCF7 cells (Li et al., 2016 Front Pharm), while IBC induces cell death in MDA-MB-231 cells (Wu et al., 2022 Molecules). In this regard, the more interesting contribution of the manuscript is the potential identification of the targets of these compounds in cancer cells. The inhibition of CHK2 by IBC is quite compelling although it needs to be further proven. In contrast, the hypothesis that BKC inhibits DNA polymerases remains highly speculative. The results offer a limited advance in the knowledge of the mechanism of action of these two compounds. Focusing on the action of IBC on CHK2 would increase the impact of the results. In this sense a very recent report has been published showing that IBC inhibits SIRT2 (Ren et al., 2024 Phyto Res), showing that IBC can affect multiple enzymes and processes. This should be taken into account for a further analysis of its mechanism of action. 

      In addition to the identification of the targets of BKC and IBC, the authors also focus on their combination for cancer treatment. This is based on the idea that blocking the DSB repair and inducing replication stress at the same time is an efficient approach to induce cancer cell death. This is not a new concept, since the loss of ATM sensitizes cancer cells to the inhibition of the replication stress response and several combination therapies have been put forward with the idea of generating replication stress and preventing the subsequent repair of the double strand breaks induced in these cells. Thus, the novelty here is limited, especially considering that the effect of BKC on DNA replication has already been described. Further, since its mechanism of action is unclear, it is difficult to ascribe the observed synergy to the speculated hypothesis. A deeper analysis of IBC as a CHK2 inhibitor would be more interesting, and the potential combination with other chemotherapy agents such as replication stress inhibitors, HU or DNA damaging agents. Also, the lack of a good control of non-transformed cells also reduces the relevance of the work. 

      In its current state, the interest of the manuscript is limited. The mechanistical advance is not strong enough and is not completely supported by the data, and the use of these compounds as a combination therapy does not provide new insights in cancer treatment. In my opinion, focusing on the inhibition of CHK2 by IBC and its potential use would broaden the impact of the results beyond the mere analysis of the action of these compounds. 

      We thank this reviewer for his/her constructive and insightful comments. We have followed his/her advice and focused our analysis on the action of IBC on CHK2. Using CETSA, we confirmed that IBC binds CHK2 to the same extent as BML-277 inhibitor, but does not bind CHK1. We also show that IBC inhibits BRCA1 phosphorylation on S988 and CHK2 phosphorylation on S516. Together with the results presented in the initial version of the manuscript, these data support the view that CHK2 is a key IBC target. We have also applied CETSA to DNA polymerases and confirmed that BKC directly targets DNA Polδ and ε. Although it is unlikely that IBC and BKC will ever be used in combination therapies, the synergistic effect that we measured on cancer cells in vivo and in vitro indicates that IBC sensitizes cancer cells to endogenous replication stress and to exogenous sources of DNA damage, which could be used to replace BKC in combination therapies. For instance, our data indicate that IBC can be used in combination with drugs such as etoposide, doxorubicin or cyclophosphamide to potentiate their effect on drug-resistant lymphoma cell lines (DLBCL). As requested by this Reviewer, we have modified the discussion section to put more emphasis on IBC and CHK2 inhibitors and we hope that he/she will now find this revised version suitable for publication.

      Reviewer #2 (Evidence, reproducibility and clarity): 

      In the manuscript by Coquel et al., the authors report their findings on the effect of 2 natural compounds from Psoralea corylofolia plant extracts on cancer cells. They show that these compounds, bakuchiol (BKC) and isobavachalcone (IBC), inhibit proliferation of cancer cells and tumor development in xenografted mice, particularly when used in combination. They further show that BKC inhibited DNA polymerases and induced replication stress, and show evidence that IBC inhibits Chk2 kinase activity and downstream double-strand break repair. Based on their findings, the authors conclude that Chk2 inhibition and DNA replication inhibition represent a potential synergistic strategy to selecting target cancer cells. 

      Major: 

      (1) The data showing IBC is a Chk2 inhibitor is weak and more rigorous investigation is needed to establish this compound as a Chk2 inhibitor. 

      As indicate in our response to Reviewer #1, we have now analyzed the binding of IBC to CHK2 using the Cellular Thermal Shift Assay (CETSA) in MCF-7 cells. Our data clearly show that IBC binds to CHK2 but not CHK1. These results are now shown in Fig. 4G and 4H.

      For one, the authors mention they screened 43 cell cycle-related kinases in vitro, but only show data for 8 kinases in their kinase activity screens. Of these 8 kinases, Chk2 is the most strongly inhibited, but there are no data shown for the other 35 kinases. 

      Data for all the protein kinases tested in the in vitro assay are now presented in Fig. S4D and S4E.  

      Additionally, the purpose of the CHK2 mutants should be discussed in the text. 

      The CHK2(I157T) mutation is linked to an increased risk of breast and colorectal cancers. CHK2(R145W) is associated with Li-Fraumeni Syndrome. Both mutations do not affect the basal kinase activity of CHK2. This information is now indicated in the legend of Fig. S4D. 

      Secondly, the western blot in Fig 3B, appears to show a very modest effect of IBC on Chk2 autophosphorylation and not that different from the effect of IBC on Akt phosphorylation in Fig S3a. Yet, the authors claim that IBC inhibits Chk2 but not Akt. To strengthen these blots, a known Chk2 inhibitor, such as the one shown in Fig 4 (BML-277) should be included as a positive control for pChk2 similarly to what was shown for Akt with MK-2206. 

      We have now replaced the western blot in Fig. 3B (now Fig. 4B) with another biological replicate. Quantifications and statistical analyses of biological replicates are shown in Fig. S4G. Overall, we observed a 50% reduction of CHK2 auto-phosphorylation in MCF7 cells treated with IBC, and a 20% reduction in AKT phosphorylation (Fig. S4A). There was no additional reduction in AKT phosphorylation when cells were treated with IBC in combination with MK-2206, compared to cells treated with MK-2206 alone. We now include the CHK2 inhibitor BML-277 as a positive control alongside with IBC to monitor CHK2 and CHK1 auto-phosphorylation in Fig. 4B, S4G, 4D and S4I, respectively.

      Western blots showing a loss of phosphorylation of additional Chk2 targets is also needed. The manuscript mentions Brca1 S988 as a Chk2 substrate important for DSB repair. Showing the effect of IBC on this phosphorylation site would strengthen the conclusions. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation at S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and S4H. 

      (2) The authors claim that the combination of IBC and BKC inhibit cell growth in a synergistic manner and that the "effect is more pronounce on cancer cells than on non-cancer cells." However, only 1 non-malignant cell line was used, and it was a fibroblast line. To make this claim, the authors need to show the effect in additional non-malignant cells, preferably with epithelial cell types. 

      We have now monitored cell proliferation using the WST-1 assay in two additional non-malignant cell lines, namely MCF-10A and RPE-1 cells. Cells were treated with IBC/BKC and their growth was compared to that of MCF-7 cells. These experiments yielded similar results to those obtained with BJ fibroblasts. These new data are now included in the revised version as Fig. S1A and S1B. 

      Minor: 

      (1) Densitometry data for all western blots should be shown with mean+/- stdev of independent western blots. 

      Densitometry data for all western blots with biological replicates are now shown in supplementary figures.

      (2) In Figure 1B the statistical test used to analyze cell number was not stated. 

      The statistical test is now indicated in Fig. 1B.

      (3) In Figure 2A, the DAPI image for BKC is the merged image and should be replaced with just DAPI. 

      This error has now been corrected.

      (4) In Figure 2B, the y-axis label says "yH2AX foci (MFI)". MFI and foci are not the same thing, and for yH2AX, the signal is often not focal. MFI of yH2AX is an appropriate measurement for replication stress, it's just not appropriate to equate MFI to foci. 

      We apologize for this labeling error, which has now been corrected.

      (5) For the 53BP1 MFI and Rad51 MFI shown in Fig 4 and Fig S4, it is more appropriate to show the number of foci/cell as these are better indicators of breaks and repair sites. MFI is influenced by expression levels of the proteins and not necessarily the break/repair. 

      The numbers of 53BP1 and RAD51 foci are now shown.

      (6) The data in Figures 5B and 5C are very difficult to read. Perhaps color-coat the lines/symbols. 

      We have now colored the graph to increase its readability. 

      Reviewer #2 (Significance): 

      The findings reported in this manuscript are timely, of interest to the field, and are mostly wellsupported by the experimental data. However, there are a few concerns that need to be addressed. 

      We are grateful to Reviewer #2 for his positive assessment of our manuscript. We hope that we have adequately addressed all of his/her specific concerns and that he/she will agree with the need to put more emphasis on IBC and CHK2 inhibition as requested by Reviewer #1.

      Reviewer #3 (Evidence, reproducibility and clarity): 

      The manuscript: "Synergistic effect of inhibiting CHK2 and DNA replication on cancer cell growth" successfully demonstrates that the compounds BKC and IBC found in Psoralea corylifolia act synergistically to inhibit cancer cell proliferation, using a wide range of well-chosen methodologies. Moreover, the authors characterized the mechanisms of action of both drugs, which result in inhibition of cell proliferation. The use of multiple cell lines and the mice models makes the study robust and complete. The manuscript presents a well written study that offers new insights and contributions to the field. 

      A few suggestions to improve the study: 

      (1) Given that both compounds BKC and IBC have already been previously described in the literature, it would be helpful for the reader to have them described better at the beginning of the study. 

      Thanks for pointing this out. We have now better described BKC and IBC at the beginning of the results section, as well as in the discussion. We agree that this could be helpful to readers.

      (2) Addition of western blot quantifications over the number of experimental repeats is important specifically for Fig. 2C and Fig. 3C where partial effect of treatment on a signal level is reported. 

      The densitometry analysis of data shown in Fig. 2C and biological replicates are now shown in Fig. S2B. Quantification for Fig. 3C (now Fig. 4D) is shown in Fig. S4I.

      (3) The quantification of mean intensity for 53BP1 and RAD51 foci should be exchanged with the quantification of number of foci per cell. While the quantification of gH2AX signal intensity is a correct representation of induction of this signal upon damage, foci formed by protein recruitment to DNA damage sites should be quantified by counting the number of foci, rather than signal in the whole cell/nucleus. These proteins exist before damage and are re-located in response to the damage. 

      Quantification of 53BP1 and RAD51 foci is now expressed as the number of foci per cell. 

      (4) Materials & Methods section is missing the methods for the experiment described in Fig. 1B. In summary, after addressing our few concerns, we believe the manuscript should be accepted for publication. 

      The WST-1 assay used for cell number quantification is included in “Reagents” in Material & Methods section.

      Reviewer #3 (Significance):

      The manuscript presents a well written study that offers new insights and contributions to the field. Although the inhibitors described have been known in science, the authors present convincingly their mode of action, which is either better characterized (for BKC) or inhibiting a different than previously suggested enzyme (for IBC). Authors also nicely pinpoint and explain the narrow window of concentrations when these two compounds act synergistically rather than additively. The analyses in multiple cell lines, mouse models and in combination with other cancer treatments, makes this study of interest not only for fundamental researchers but also for translational scientists and industry.

      My field of expertise: DNA replication and replication stress across model systems. 

      We are grateful to Reviewer #3 for his/her very positive assessment of our work and we hope that he/she will find this revised version suitable for publication.

    1. Author response:

      Reviewer #1: 

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed, and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed, and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations, and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      We appreciate the reviewer’s recognition of our integration of state-of-the-art computational methods, including AlphaFold2, Rosetta, MD simulations, and Markov modeling. We are pleased that the reviewer found our approach to investigating the structure-function relationship of the hERG channel insightful.

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We acknowledge the concern regarding the selection criteria for the inactivated state models. In the revised manuscript version, we plan to broaden our selection approach and explicitly include conformations from different clusters beyond those highlighted in the initial submission (e.g., from cluster 3). We will also incorporate structural metrics that do not solely depend on the known channel inactivation hallmarks or reply on the pLDDT scores to further justify our chosen representative inactivated state models.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We agree that a more rigorous control for our drug-binding predictions is desirable. To address this, we will include molecular docking simulations and associated drug binding affinity estimations for more hERG channel models, including alternate conformations from the initial clustering that were not chosen as the final models. This will allow us to test whether our inactivated state structure from cluster 2 indeed outperforms or differs significantly from other possible inactivated hERG channel conformations in reproducing experimental drug potencies.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We will incorporate statistical analyses and measures of uncertainty for key comparisons. In Figures 3 and S1-S4 the consensus structural hERG channel models for open, inactivated and closed states are being compared, i.e. one representative model for each state. We believe this is a valid comparison, and the statistical analysis of the observed trends based on those models (e.g., in the bar plot of Figure 3d) alone might not be possible. However, we agree with the reviewer that instead of relying solely on those initial static models, we will also draw on the ensemble of states sampled during the MD simulations to quantify structural differences between different putative hERG channel states. Specifically, we will present ensemble-averaged measurements and highlight how these distributions differ significantly between states.

      Reviewer #2:

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiments support the plausibility of their models.

      Strengths:

      This is thorough work studied from many different angles. It provides a self-consistent picture of how conformational changes in hERG may affect its function and binding to different targets.

      We are grateful for the reviewer’s recognition of the thoroughness and multi-faceted nature of our study.

      Weaknesses:

      Though this work claims the methodologies can be generalized to other systems, it is not obvious how. Many modeling choices seem arbitrary and also seem to have required extensive expert knowledge of the system. This limits the applicability of the modeling strategy.

      We appreciate the reviewer’s comment on the generalizability of our approach. In the revision, we will more explicitly discuss the rationale behind the modeling choices and the extent to which they reflect system-specific knowledge. We will clarify how the strategies we developed (e.g., iterative refinement with AlphaFold2 and Rosetta, followed by MD simulation validation) can be adapted to other ion channels or related proteins. We will also outline a more generalizable workflow, specifying which steps require system-specific information and which steps are broadly applicable.

      Reviewer #3:

      Summary:

      The authors use Alphafold2, Rosetta, and Molecular Dynamics to model structures of the hERG K channel in open, inactive, and closed states. Experimental CryoEM data for open hERG (Wang and Mackinnon 2017), and closed EAG (Mandala and Mackinnon, 2002) were used as the main templates for channel models presented here. Given the importance of hERG as a safety pharmacology target, the identification of a robust simulation method to assess drug block is an important addition to the field.

      Strengths

      The key findings here are new inactivated and closed hERG channel conformations and hERG channel conformations with drugs docked in the inner vestibule below the selectivity filter. Amino acid pathways and interaction networks for different states are also presented.

      The inactive state and drug block models are carefully correlated with experimental data for the inactivated state of hERG (Lau et al, 2024) and with experimental free energy data for drug binding and have overall good agreement.

      It is remarkable that using cytoplasmic domain structures of hERG as a starting point revealed inactivation state structures in the hERG selectivity filter in Figures 2,3.

      We thank the reviewer for highlighting the novelty and importance of our work, particularly regarding the identification of new inactivated and closed hERG channel conformations and the modeling of drug block. We are also pleased that the reviewer found the correlation with experimental data to be strong and the structural insights to be valuable.

      Weaknesses

      Figure 6, if each data point is for a different drug, then perhaps identify each point.

      Thank you so much for this suggestion. Please note that Table 3 contains drug-specific data plotted in Figure 6 including drug names. We will provide a reference to Table 3 in the revised Figure 6 caption. We will also revise Figure 6 (and any similar figures) to clearly identify each data point with the corresponding drug and/or include a corresponding key in the Figure legend. This will make it easier to correlate each data point’s binding prediction with the experimental datasets.

      The PAS domain was not included in the models as stated in Methods page 14 but the PAS does appear in some of the templates used as starting points for models in Figure 1 a,b,c. Perhaps mentioning that the PAS was not included in some (all?) of the final models should be moved into the main text and discussed.

      The drug block of 1b channels (which do not contain PAS) has been reported to be slightly different than that for 1a channels (which contain PAS) and for 1a/1b channels (see London et al., 1997; https://doi.org/10.1161/01.RES.81.5.870 and Abi-Gerges et. al., 2011; DOI: 10.1111/j.1476-5381.2011.01378.x) and this should be discussed since the models presented here appear to be performed in the absence of the PAS.

      It also appears that the N-linker region (between PAS and the S1) and distal C region of hERG (post CNBHD-COOH) are not included in models, please state this if correct, and discuss.

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on hERG channel drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. Similarly, the N-linker and the distal C-region were also omitted from the final models. These omissions were primarily due to hardware constraints used for AlphaFold structural modeling, as including these additional protein regions would exceed the memory capacity of graphical processing unit (GPU) cards on our available intramural, external and cloud high-performance computing resources, leading to failures during the protein structure prediction step.

      The PAS domain of hERG 1a isoform, even if not serving as a direct drug-binding site, can influence the gating kinetics of hERG channels as the reviewer pointed out. By altering the probability and duration with which those ion channels occupy specific conformational states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts channel gating so that more channels enter (and remain in) the inactivated state, drugs with a higher affinity for that state would appear to bind more potently, as observed in electrophysiological experiments. It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the ion channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG 1a channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We will incorporate a discussion of these points into the main text, acknowledging the limitations of our current models, citing the references provided by the reviewer, and highlighting the need for future studies to explore these protein regions in greater detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The study by Fang et al. reports a 3D MERFISH method that enables spatial transcriptomics for tissues up to 200um in thickness. MERFISH, as well as other spatial transcriptomics technologies, have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spatial transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have a major impact on future spatial transcriptomics studies to benefit diverse biomedical fields. 

      Strengths: 

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed, and the results are solid and compelling. 

      Response: We thank the reviewer for the positive comments on our manuscript.  

      Weaknesses: 

      The biological application examples were limited to cell type/subtype classification in two brain regions. Additional examples of how the data could be used to address important biological questions will enhance the impact of the study. 

      We appreciate the reviewer's suggestion that demonstrating the broader applications of our thick-tissue 3D MERFISH method to address important biological questions would enhance the impact of our study. In line with the reviewer's feedback, we have included discussions on how this method could be applied to address various biological questions in the summary (last) paragraph of our manuscript. These discussions highlight the versatility and utility of our approach in studying diverse biological processes beyond cell type classification. 

      However, the goal of this work is to develop a method and establish its validity. While we are interested in applying it to addressing important biological questions in the future, we consider these applications beyond the scope of this work. 

      Reviewer #2 (Public Review): 

      Summary: 

      In their preprint, Fang et al present data on extending a spatial transcriptomics method, MERFISH, to 3D using a spinning disc confocal. MERFISH is a well-established method, first published by Zhuang's lab in 2015 with multiple follow-up papers. In the last few years, MERFISH has been used by multiple groups working on spatial transcriptomics, including approximately 12 million cell maps measured in the mouse brain atlas project. Variants of MERFISH were used to map epigenetic information complementary to gene expression and RNA abundance. However, MERFISH was always limited to thin ~10um sections to this date.

      The key contribution of this work by Fang et al. was to perform the optimization required to get MERFISH working in thick (100-200um) tissue sections. 

      Major strengths and weaknesses: 

      Overall the paper presents a technical milestone, the ability to perform highly multiplexed RNA measurements in 3D using MERFISH protocol. This is not the first spatial transcriptomics done in thick sections. Wang et al. 2018 - StarMAP used thick sections (150 um), and recently, Wang 2021 (EASI-FISH, not cited) performed serial HCR FISH on 300um sections. Data so far suggest that MERFISH has better sensitivity than in situ sequencing approaches (StarMAP) and has built-in multiplexing that EASI-FISH lacks. Therefore, while there is an innovation in the current work, i.e., it is a technically challenging task, the novelty, and overall contribution are modest compared to recently published work.  

      The authors could improve the writing and the manuscript text that places their work in the right context of other spatial transcriptomics work. Out of the 25 citations, 12 are for previous MERFISH work by Zhuang's lab, and only one manuscript used a spatial transcriptomics approach that is not MERFISH. Furthermore, even this paper (Wang et al, 2018) is only discussed in the context of neuroanatomy findings. The fact that Wang et al. were the first to measure thick sections is not mentioned in the manuscript. The work by Wang et al. 2021 (EASI-FISH) is not cited at all, as well as the many other multiplexed FISH papers published in recent years that are very relevant. For example, a key difference between seqFISH+ and MERFISH was the fact that only seqFISH+ used a confocal microscope, and MERFISH has always been relying on epi. As this is the first MERFISH publication to use confocal, I expect citations to previous work in seqFISH and better discussions about differences. 

      We thank the reviewer for recognizing our work as a technical milestone. Since the aim of this work is to build upon the strengths of MERFISH and address some of its limitations, we primarily cited previous MERFISH papers to clarify the specific improvements made in this work. Given the rapid growth of the spatial omics field, it has become impractical to comprehensively cite all method development papers. Instead, we cited a 2021 review article in the first sentence of the originally submitted manuscript and limited all discussions afterwards to MERFISH. In light of this reviewer’s suggestion to more broadly cite spatial transcriptomics work, we added two additional review articles on spatial omics. Spatial omics methods primarily include two categories: 1) imaging-based methods and 2) next-generation-sequencing based methods. The 2021 review article [Zhuang, Nat Methods 18,18–22 (2021)) included in the originally submitted manuscript is focused on imaging-based methods. The additional 2021 review article [Larsson et al., Nat Methods 18, 15–18 (2021)] that we now included in the revised manuscript is focused on next-generation-sequencing based methods. We also added a more recent review article published in 2023 [Bressan et al., Science 381:eabq4964 (2023)], which covers both categories of methods and include more recent technology developments. All three review articles are now cited in parallel in the first introductory paragraph of the manuscript.

      Although we presented our work as an advance in MERFISH specifically, we do consider the reviewer’s suggestion of citing the 2018 STARmap paper [Wang et al., Science 361, eaat5961 (2018)] in the introduction part of our manuscript reasonable. This STARmap paper was already cited in the results part of our originally submitted manuscript, and we have now described this work in the introduction part of our revised manuscript (third paragraph), as this paper was the first to demonstrate 3D in situ sequencing in thick tissues. In addition, we thank the reviewer for bringing to our attention the EASI-FISH paper [Wang et al, Cell 184, 6361-6377 (2021)], which reported a method for thick-tissue FISH imaging and demonstrated imaging of 24 genes using multiple rounds of multi-color FISH imaging. We also recently became aware of a paper reporting 3D imaging of thick samples using PHYTOMap [Nobori et al, Nature Plants 9, 10261033 (2023)]. This paper, published a few days after we submitted our manuscript to eLife, demonstrated imaging of 28 genes in thick plant samples using multiple rounds of multicolor FISH and the probe targeting and amplification methods previously developed for in situ sequencing. We also included these two papers in the introduction section of our revised manuscript (third paragraph). In addition, we also expanded the discussion paragraph (last paragraph) of the manuscript to discuss these thick tissue imaging methods in more details, and in the same paragraph, we also included discussions on two recent bioRxiv preprints in thicktissue transcriptomic imaging [Gandin et al., bioRxiv, doi:10.1101/2024.05.17.594641 (2024); Sui et al., bioRxiv, doi:10.1101/2024.08.05.606553 (2024)]

      However, we do not consider our use of confocal imaging in this work an advance in MERFISH because confocal microscopy, like epi-fluorescence imaging, is a commonly used approach that could be applied to MERFISH of thin tissues directly without any alteration of the protocol. Confocal imaging has been broadly used for both DNA and RNA FISH before any genomescale imaging was reported. Confocal and epi-imaging geometries have their distinct advantages, and which of these imaging geometries to use is the researcher’s choice depending on instrument availability and experimental needs. Thus, we do not find it necessary to cite specific papers just for using confocal imaging in spatial transcriptomic profiling. Our real advance related to confocal imaging is the use of machine-learning to increase the imaging speed. Without this improvement, 3D imaging of thick tissue using confocal would take a long time and likely degrade image quality due to photobleaching of out-of-focus fluorophores before they are imaged. We thus cited several papers that used deep learning to improve imaging quality and/or speed [(Laine et al., International Journal of Biochemistry & Cell Biology 140:106077 (2021); Ouyang et al., Nat Biotechnol 36:460–468 (2018); Weigert et al., Nat Methods 15:1090–1097 (2018)] in our original submission. Our unique contribution is the combination of machine learning with confocal imaging for 3D multiplexed FISH imaging of thick tissue samples, which had not been demonstrated previously.

      To get MERFISH working in 3D, the authors solved a few technical problems. To address reduced signal-to-noise due to thick samples, Fang et al. used non-linear filtering (i.e., deep learning) to enhance the spots before detection. To improve registrations, the authors identified an issue specific to their Z-Piezo that could be improved and replaced with a better model. Finally, the author used water immersion objectives to mitigate optical aberrations. All these optimization steps are reasonable and make sense. In some cases, I can see the general appeal (another demonstration of deep learning to reduce exposure time). Still, in other cases, the issue is not necessarily general enough (i.e., a different model of Piezo Z stage) to be of interest to a broad readership. There were a few additional optimization steps, i.e., testing four concentrations of readout and encoder probes. So while the preprint describes a technical milestone, achieving this milestone was done with overall modest innovation. 

      We appreciate the reviewer's recognition of the technical challenges we have overcome in developing this 3D thick-tissue MERFISH method. To achieve high-quality thick- tissue MERFISH imaging, we had to overcome multiple different challenges. We agree with the reviewer that the solutions to some of the above challenges are intellectually more impressive than the remaining ones that required relatively more mundane efforts. However, all of these are needed to achieve the overall goal, a goal that is considered a milestone by the reviewer.  We believe that the impact of a method should be evaluated based on its capabilities, potential applications, and its adaptability for broader adoption. In this regard, we anticipate that our reported method will be valuable and impactful contribution to the field of spatial biology.

      Data and code sharing - the only link in the preprint related to data sharing sends readers to a deleted Dropbox folder. Similarly, the GitHub link is a 404 error. Both are unacceptable. The author should do a better job sharing their raw and processed data. Furthermore, the software shared should not be just the MERlin package used to analyze but the specific code used in that package.  

      We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process. We have now made all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin v2.2.7 package itself, we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) It will be good to expand the application section to demonstrate the utility of 3D MERFISH to address diverse types of biological questions for the two brain regions examined. At present, it only examined the localization of various cell clusters in the tissues. Can it be used to examine both short and long-range interactions, for example? 

      We appreciate the reviewer's feedback and agree that demonstrating the broader applications of our 3D thick-tissue MERFISH imaging method in addressing diverse biological questions would enhance the impact of our study.  

      In line with the reviewer’s comments, one of the analyses we performed in the manuscript was examining short-range interactions based on soma contact between adjacent neurons in the two brain regions studied (see third-to-last and second-to-last paragraphs of the Main text). This analysis provided insights into the spatial organization of inhibitory neurons and potential interactions between the same type of interneurons in these brain regions. 

      Although long-range interactions, for example synaptic interactions between neurons, would be of great interest, our current 3D MERFISH measurements does not allow such interactions to be determined. Future research to enable measurements of synaptic interactions between molecularly defined neuronal subtypes would be interesting, but we consider this to be out of the scope of the current study.

      (2) For the nearest neighbor distance analysis in Figure 3, the method seems to be missing. Please add details about this analysis to allow better understanding. It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes. Please explain. 

      We apologize for the missing the nearest neighbor distance analysis in the Materials and Methods section.  We have added the detailed description of this analysis to the Materials and Methods section of the revised manuscript (last subsection of Materials and Methods).

      Regarding the comment “It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes”, this is not necessarily counter-intuitive given how we defined nearest-neighbor distances between the same subtype of neurons and nearestneighbor distances between different subtypes of neurons. Here is how we performed this analysis for interneurons. First, we determined the nearest-neighbor neurons for each interneuron and classified it as either having another interneuron of the same type as the nearest neighbor or having a different type of interneuron or an excitatory neuron as the nearest neighbor. We then determine the distributions for the distances between these two types of nearest neighbors and compared these distributions. When a neuronal subtype for a tight spatial cluster, such as the type-A cluster shown in the schematic below, the nearest-neighbor distances between nearest neighbor A-A pairs are indeed small. However, the distance between a type-A neuron and a different type of neurons (for example, type-B) is not necessarily bigger than those between two type-A neurons, if the nearest neighbor cell for this type-A neuron is a type-B neuron. These nearest-neighbor A-B pairs are likely formed between type-A neurons at the edge of the cluster with type-B neurons near the edge of the type-A cluster. If the distance of an A-B pair is not comparable to those of nearest-neighbor A-A pairs, it is unlikely a nearestneighbor pair by our definition as described above.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The scholarship in this work is lacking. All of the non-MERFISH parts of the field of spatial transcriptomics are ignored. The work needs to be discussed in the context of the literature. 

      We thank the reviewer for this suggestion and have included discussions of other spatial omics work, and other thick-tissue multiplexed imaging work in the Introduction and discussion section of the manuscript. Please see details in our response to the Public Review  portion of this reviewer’s comments.  

      (2) The data/code sharing links are broken and need to be fixed. 

      Response: We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process We have now placed all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin (MERFISH decoding package itself), we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922) to ensure that the readers can fully reproduce the results presented in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady-state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. As with the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to the development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      Weaknesses:

      A weakness of the study is the lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only). The authors did use bone marrow chimeras, but only in one experiment. This work implies that psoriasis may represent a Sema4a-deficient state in the epidermal cells, while the same might not be true for immune cells. Indeed, in their analysis of non-lesional psoriasis skin, Sema4a was not significantly decreased compared to control skin, possibly due to compensatory increased Sema4a from other cell types. Unbiased RNA-seq of Sema4a KO mouse skin for comparison to non-lesional skin might identify other similarities besides mTOR signaling. Indeed, targeting mTOR with rapamycin reveres some of the skin changes in Sema4a KO mice, but not skin thickness, so other pathways impacted by Sema4a may be better targets if they could be identified. Utilizing WT→KO chimeras in addition to global KO mice in the experiments in Figures 6-8 would more strongly implicate the separate role of Sema4a in skin vs immune cell populations and might more closely mimic non-lesional psoriasis skin.

      We sincerely appreciate your summary and for pointing out the strengths and weaknesses of our study. Although we were unfortunately unable to perform all these experiments due to limitations in our resources, we fully agree with the importance of studying tissue-specific Sema4A KO mice. As an alternative, we compared the IL-17A-producing potential of skin T cells between WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry. The results were comparable between the two groups. Additionally, we performed RNA-seq on the epidermis of WT and Sema4A KO mice. While we did not find similarities between Sema4A KO skin and non-lesional psoriasis except for S100a8 expression, we will further try to seek for the mechanisms how Sema4A KO skin mimics non-lesional psoriasis skin as a future project.

      Although targeting mTOR with rapamycin did not reverse the epidermal thickness in Sema4A KO mice, rapamycin was effective in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4A KO mice. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes, which would be of interest to clinicians. Thank you once again for your valuable insights.

      Reviewer #2 (Public Review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism by which decreased Sema4A expression may exacerbate psoriasis is unclear as yet.

      We greatly appreciate your summary and thoughtful feedback on the strengths and weaknesses of our study. In response, we have included the results of additional experiments on IL-23-mediated psoriasis-like dermatitis, which showed that epidermal thickness was significantly greater in KO mice compared to WT mice. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells within the CD3 fraction of the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis. Furthermore, we examined STAT3 expression in the epidermis of WT and Sema4A KO mice using Western blot analysis, and the results were comparable between the two groups. However, the mechanism by which decreased Sema4A expression may exacerbate psoriasis remains unclear. We have added some explanations and presumptions to the limitations section. Thank you once again for your valuable insights.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1C

      What statistics were used? The supplemental notes adjusted the P value, what correction for multiple comparisons was utilized? Could the authors instead show logFC for the DEGs between Ctl and L in each cluster? This might be best demonstrated with a volcano plot, highlighting SEMA4A, and other genes known to be DE in psoriasis.

      We apologize for not including the detailed analysis methods in the original manuscript submission. We analyzed the scRNA-seq data using Cellxgene VIP with Welch’s t-test. Multiple comparisons were performed using the Benjamini-Hochberg procedure, setting the false discovery rate (FDR) at 0.05. These details are now explained in the MATERIALS AND METHODS section of the resubmitted manuscript. We also added a log2FC-log10 p-value graph for the DEGs in keratinocytes between Ctl and L to Figure 1-figure supplement 1D. The log2FC values in keratinocytes, dendritic cells, and macrophages were -0.07, 0.00, and -0.05, respectively. Although the log2FC is low in keratinocytes, the adjusted p-value (padj) for Sema4A is 2.83×10-39, indicating a statistically significant difference.

      Page 8 Line 111 in the resubmitted manuscript:

      “The adjusted p-value (padj) for SEMA4A in keratinocytes between Ctl and L was 2.83×10-39, indicating a statistically significant difference despite not being visually prominent in the volcano plot, which shows comprehensive differential gene expression in keratinocytes (Figure 1C; Figure 1-figure supplement 1D).”

      Page 54: In the Figure legend of Figure 1-figure supplement 1D in the resubmitted manuscript:

      “(D) The volcano plot displays changes in gene expression in psoriatic L compared to Ctl.”

      Page 30 Line 481 in the resubmitted manuscript: In the “Data processing of single-cell RNA-sequencing and bulk RNA-sequencing” section.

      “The data was integrated into an h5ad file, which can be visualized in Cellxgene VIP (K. Li et al., 2022). We then performed differential analysis between two groups of cells to identify differential expressed genes using Welch’s t-test. Multiple comparisons were controlled using the Benjamini-Hochberg procedure, with the false discovery rate set at 0.05 and significance defined as padj < 0.05.”

      Figure 2B

      The results narrative notes WT->WT is comparable to KO->WT. No statistics are given for this comparison. It appears the difference is less than the other comparisons, but still may be significant. Also, in the supplemental for Figure 2B, there appear to be missing columns for the 4 BM chimera groups (columns for WT and KO, but not 4 columns for each donor: recipient pair).

      We sincerely apologize for any confusion. We presented the results of the chimeric mice in Figure 3, and Figure 3-source data 1 shows the 4 BM chimera groups. In Figure 3B, the p-value for the comparison between WT->WT mice and KO->WT mice was 0.7988, as indicated in Figure 3-source data 1.

      Figure 3B

      While ear skin is not easily obtainable at day 0 for comparison, why not also include back skin at Wk 8? If the back skin epidermis is thicker like the ear skin, it supports the ear skin conclusion and adds a more consistent comparison. If the back skin epidermis is not thicker, what would be the author's explanation as to the why only ear skin epidermis is thicker in KO mice at 8 weeks?

      We appreciate and completely agree with the reviewer’s insightful comment. We have added images and dot plots of the back skin at Week 8 in Figure 4B. Since the back skin epidermis is thicker, similar to the ear skin, these results support the conclusion drawn from the ear skin data. Regarding Figure 4C, which shows the expression of Sema4a in the epidermis and dermis of 8-week-old WT mouse ear, we have modified the sentence in the manuscript to ‘the epidermis of WT ear at Week 8’ for clarification.

      Page 12 Line 180 in the resubmitted manuscript:

      “While epidermal thickness of back skin was comparable at birth (Figure 4B), on week 8, epidermis of Sema4AKO back and ear skin was notably thicker than that of WT mice (Figure 4B), suggesting that acanthosis in Sema4AKO mice is accentuated post-birth.”

      Page 47: In the Figure legend of Figure 4B in the resubmitted manuscript:

      “(B) Left: representative Hematoxylin and eosin staining of Day 0 back and Wk 8 back and ear. Scale bar = 50 μm. Right: Epi and Derm thickness in Day 0 back (n = 5) and Wk 8 back (n = 5) and ear (n = 8).”

      Figures 3C&D, Figures 4 D-F

      The figures might be easier to read if some of the data is moved to supplemental, especially in Figure 4, which has 36 panels just in D-F. Conversely, the dLN data is important in establishing the skin microenvironment as important in the accumulation of γδ cells and IL-17 production in the setting of Sema4a KO, so this might be more impactful if moved to the main figure.

      We appreciate and agree with your comments. As recommended, we have moved data from Figure 3C and 4D-F to the supplemental section. The dLN data have been moved to the main figure as Figure 4E. This has improved the readability of the figures.

      Figure 5 and Figure 6 might work better if combined. The differences in keratinocytes in psoriasis are well-known, so the novelty is how Sema4a KO skin appears to share similar differences. This would be easier to see if compared side-by-side in the same figure. Also, there is an opportunity to show this more rigorously by performing RNA-seq on WT vs Sema4a KO skin. Showing a larger set of DEGs that trend similarly between Ctl/NL psoriasis and WT/Sema4a KO skin in a heatmap would bolster the conclusion that Sema4a deficiency contributes to a psoriasis-like skin defect.

      We appreciate your valuable suggestion. Following your recommendation, we have combined Figures 5 and 6 to facilitate a side-by-side comparison. This highlights the similarities between Sema4AKO skin and psoriasis, making it easier to observe differences in keratinocytes. Additionally, we performed RNA-seq on WT and Sema4a KO epidermis (n = 3 per group). We analyzed the raw count data using iDEP 2.0 (Ge S.X., BMC Bioinformatics, 2018), setting the minimal counts per million to 0.5 in at least one library. Differential gene expression analysis was conducted using DEseq2, with an FDR cutoff of 0.1 and a minimum fold change of 2. As a result, we identified 46 upregulated and 70 downregulated genes in Sema4AKO mice compared to WT mice (see the volcano plot and heat map). However, except for S100a8, we did not observe significant expression changes in non-lesional psoriasis-related genes between WT and Sema4AKO mice. In the future, we aim to identify subtle stimuli that could cause gene expression changes between these groups and we would like to perform additional RNA-seq experiments.

      Author response image 1.

      Author response image 2.

      Page 48: The Figure title of Figure 5 in the resubmitted manuscript:

      “Figure 5: Sema4AKO skin shares the features of human psoriatic NL.”

      SEMA4A is not significantly DE between Ctl and NL in the psoriasis RNA-seq data. If a lower expression of SEMA4A in psoriasis skin is a driving part of the phenotype, why is this not observed in the RNA-seq data? Presumably, this could be explained by infiltration of immune cells with increased SEMA4A expression, like in the scRNA-seq data in Figure 1. If so, might it be useful to analyze WT->KO chimera mice similarly to global KO mice in Figures 6-8? This might more accurately reflect what is happening in psoriasis, if epidermal SEMA4A expression is low, but immune expression is not. The KO data on their own nicely show a skin phenotype, but these additional experiments might more closely mimic psoriatic disease and increase the rigor and impact of the study.

      We really appreciate your insightful comments. Due to the limitations of the animal experimentation facility, we regret that we are unable to create additional chimeric mice. Although our analysis is limited, we compared IL-17A production from T cells of WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry (see Author response image 3 below; n = 6 for WT→KO, n = 4 for KO→KO). This comparison revealed that IL-17A production from T cells was comparable, regardless of whether they were derived from WT or Sema4AKO mice, when the skin constituent cells were derived from Sema4AKO. We appreciate the value of your advice, and agree that investigating keratinocyte differentiation and mTOR signaling in the epidermis, using either WT→KO chimeric mice or keratinocyte-specific Sema4A-deficient mice, is a crucial next step in our research.

      Author response image 3.

      Figure 8

      Rapamycin was able to partially reverse the psoriasis-like skin phenotype in Sema4a KO mice. Would rapamycin also be effective in the more severe disease induced by IMQ in Sema4a KO mice? While partially reducing the effect of Sema4a KO on steady-state skin with rapamycin strengthens the link to mTOR dysregulation, it did not change skin thickness. It's unclear if this would be useful clinically for patients with well-controlled psoriasis (NL skin). Would it be useful to reverse active, lesional psoriatic skin changes? Testing this might yield results more relevant to clinicians and patients.

      We are grateful for your valuable feedback. Rapamycin showed effectiveness in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4AKO mice. Rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16. We included these results to Figure 7-figure supplement 2. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes and may be of interest to clinicians and patients.

      Page 17 Line 269 in the resubmitted manuscript:

      “Next, we investigated whether intraperitoneal rapamycin treatment effectively downregulates inflammation in the IMQ-induced murine model of psoriasis in Sema4AKO mice (Figure 7-figure supplement 2A). Rapamycin significantly reduced epidermal thickness compared to vehicle treatment (Figure 7-figure supplement 2B). Additionally, rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16 (Figure 7-figure supplement 2C). While the upregulation of Il17a in the Sema4AKO epidermis in IMQ model was not clearly modified by rapamycin (Figure 7-figure supplement 2C), immunofluorescence revealed a decrease in the number of CD3 T cells in Sema4AKO epidermis by rapamycin (Figure 7-figure supplement 2D). In the naive states, mTORC1 primarily regulates keratinocyte proliferation, whereas mTORC2 mainly involved in the keratinocyte differentiation through Sema4A-related signaling pathways. Conversely, in the psoriatic dermatitis state, rapamycin downregulated both keratinocyte differentiation and proliferation markers. The observed similarities in Il17a expression following treatment with rapamycin and JR-AB2-011, regardless of additional IMQ treatment, suggest that Il17a production is not significantly dependent on Sema4A-related mTOR signaling.”

      Page 29 Line 461 in the resubmitted manuscript: In the “Inhibition of mTOR” section.

      “To analyze the preventive effectiveness of rapamycin in an IMQ-induced murine model of psoriatic dermatitis, Sema4AKO mice were administered either vehicle or rapamycin intraperitoneally from Day 0 to Day 17, and IMQ was topically applied to both ears for 4 days starting on Day 14. Then, on Day 18, ears were collected for further analysis.”

      Page 71: Figure 7-figure supplement 2 in the resubmitted manuscript:

      “Figure 7-figure supplement 2: Rapamycin treatment reduced the epidermal swelling observed in IMQ-treated Sema4AKO mice.

      (A) Experimental scheme. (B) The Epi thickness on Day 18. (n = 10 for Ctl, n = 12 for Rapamycin). (C) Relative expression of keratinocyte differentiation markers and Il17a in Sema4AKO Epi (n = 10 for Ctl, n = 12 for Rapamycin). (D) The number of T cells in the Epi (left) and Derm (right), under Ctl or rapamycin and IMQ treatments (n = 10 for Ctl, n = 12 for Rapamycin). Each dot represents the sum of numbers from 10 unit areas across 3 specimens. A-C: *p < 0.05, **p < 0.01. NS, not significant.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To know whether the decrease of Sema4A in the epidermis of psoriasis patients is a result or a cause of psoriasis, it is necessary to show how the expression of Sema4A in epidermal cells is regulated. Shouldn't the degree of change in the expression of essential molecules (which is the cause of psoriasis) be more pronounced in L than in NL?

      We surveyed transcription factors of human Sema4A using GeneCards and found that NF-κB is the transcription factor most frequently associated with psoriasis. Wang et al. (Arthritis Res Ther. 2015) indicated NF-κB-dependent modulation of Sema4A expression in synovial fibroblasts of rheumatoid arthritis. However, since NF-κB expression is reportedly upregulated in psoriasis lesions, other transcription factors may function as key modulators of Sema4A expression in the epidermis.

      Although the molecules causing psoriasis remain to be elucidated, we investigated the correlation between the expression of psoriasis-related essential molecules in keratinocytes—such as S100A7A, S100A7, S100A8, S100A9, and S100A12—and SEMA4A expression in L and NL samples using qRT-PCR. We could not identify a correlation between these molecules and SEMA4A expression. We added a note to the limitations section to acknowledge that we were not able to reveal how Sema4A expression is regulated and that we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.

      Page 21 Line 328 in the resubmitted manuscript:

      “We were not able to reveal how Sema4A expression is regulated. Although we showed that downregulation of Sema4A is related to the abnormal cytokeratin expression observed in psoriasis, we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.”

      (2) Using bone marrow chimeric mice, it has already been reported that hematopoietic cells contain keratinocyte stem cells. Therefore, their interpretation is not supported by the results of their bone marrow chimeric mice experiment, and it is essential to generate keratinocyte-specific Sema4A knockout mice and perform similar experiments to support their interpretation.

      We value the reviewer’s insightful comment. We have assessed the expression of Sema4a in the epidermis of WT→KO chimeric mice using qRT-PCR. Our findings indicate that Sema4a expression levels in the epidermis of these mice are minimal (cycle threshold values of Sema4a ranged from 31.9 to not detected in WT→KO chimeric mice, whereas they ranged from 24.5 to 26.2 in WT→ WT mice). Consequently, we believe that the impact of keratinocyte stem cells derived from WT-hematopoietic cells is limited in this model. We appreciate this opportunity to clarify our results and will consider the generation of keratinocyte-specific Sema4A knockout mice for future experiments to further substantiate our interpretation.

      Page 11 Line 159 in the resubmitted manuscript:

      “Since it has already been reported that bone marrow cells contain keratinocyte stem cells (Harris et al., 2004; Wu, Zhao, & Tredget, 2010), we confirmed that epidermis of mice deficient in non-hematopoietic Sema4A (WT→KO) showed no obvious detection of Sema4a, thereby ruling out the impact of donor-derived keratinocyte stem cells infiltrating the host epidermis (Figure 3-figure supplement 1A).”

      Page 60: In the Figure legend of Figure 3-figure supplement 1A in the resubmitted manuscript:

      “(A) Sema4a expression in the Epi of WT→ WT mice and WT→ KO mice (n = 8 for WT→ WT, n = 7 for WT→ KO).”

      (3) Since Sema4A KO mice already have immunological and epidermal cell characteristics similar to psoriasis, albeit weak, it is possible that the nonspecific stimulus of simply topical IMQ may have appeared to exacerbate psoriasis. It is advisable to confirm whether a more psoriasis-specific stimulus, IL-23 administration, would produce similar results.

      Thank you for your suggestion. Following your advice, we have analyzed IL-23-mediated psoriasis-like dermatitis. To induce the model, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 was injected intradermally into both ears for 4 consecutive days. Unlike with the application of IMQ, there was no significant difference in ear thickness. However, H&E staining revealed that the epidermal thickness was significantly greater in KO mice compared to WT mice. Although a longer period of IL-23 induction might result in more pronounced ear swelling, we conducted this experiment over the same duration as the IMQ application experiment to maintain consistency. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis.

      The lack of significant difference in ear thickness changes with IL-23 administration might be due to IL-23 administration not reflecting upstream events of IL-23 production.

      We consider that in psoriasis, the expression of Sema4A in keratinocytes is likely more important than in T cells. Therefore, it makes sense that the phenotype difference was more pronounced with IMQ, which likely has a greater effect on keratinocytes compared to IL-23.

      Page 9 Line 137 in the resubmitted manuscript:

      “Though the imiquimod model is well-established and valuable murine psoriatic model (van der Fits et al., 2009), the vehicle of imiquimod cream can activate skin inflammation that is independent of toll-like receptor 7, such as inflammasome activation, keratinocyte death and interleukin-1 production (Walter et al., 2013). This suggests that the imiquimod model involves complex pathway. Therefore, we subsequently induced IL-23-mediated psoriasis-like dermatitis (Figure2-figure supplement 2A), a much simpler murine psoriatic model, because IL-23 is thought to play a central role in psoriasis pathogenesis (Krueger et al., 2007; Lee et al., 2004). Although ear swelling on day 4 was comparable between WT mice and Sema4AKO mice (Figure2-figure supplement 2B), the epidermis, but not the dermis, was significantly thicker in Sema4AKO mice compared to WT mice (Figure2-figure supplement 2C). We found that the proportion of CD4 T cells among T cells was significantly higher in Sema4A KO mice compared to WT mice, while the proportion of Vγ2 and DNγδ T cells among T cells was comparable between them (Figure 2-figure supplement 2D). On the other hand, focusing on IL-17A-producing cells, the proportion of IL-17A-producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from imiquimod-induced psoriasis-like dermatitis. (Figure 2-figure supplement 2E).”

      Page 24 Line 363 in the resubmitted manuscript: In the “Mice” section.

      “To induce IL-23-mediated psoriasis-like dermatitis, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 (BioLegend, San Diego, CA) was injected intradermally into both ears of anesthetized mice using a 29-gauge needle for 4 consecutive days.”

      Page 58: In the Figure legend of Figure 2-figure supplement 2 in the resubmitted manuscript:

      “IL-23-mediated psoriasis-like dermatitis is augmented in Sema4AKO mice.

      (A) An experimental scheme involved intradermally injecting 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 into both ears of WT mice and KO mice for 4 consecutive days. Samples for following analysis were collected on Day 4. (B and C) Ear thickness (B) and Epi and Derm thickness (C) of WT mice and KO mice on Day 4 (n = 12 per group). (D and E) The percentages of Vγ3, Vγ2, DNγδ, CD4, and CD8 T cells (D) and those with IL-17A production (E) in CD3 fraction in the Epi (top) and Derm (bottom) of WT and KO ears (n = 5 per group). Each dot represents the average of 4 ear specimens. B-E: *p < 0.05, **p < 0.01. NS, not significant.”

      (4) How is STAT3 expression in the epidermis crucial in the pathogenesis of psoriasis in Sem4AKO mice?

      We appreciate your insightful comment. In our study, given the established role of activated STAT3 in psoriasis, we investigated both total STAT3 and phosphorylated STAT3 (p-STAT3) levels in the naive epidermis of WT and Sema4AKO mice (See the figure below). Our findings indicate that STAT3 activation does not occur in the epidermis of Sema4AKO mice. Therefore, we speculated that the hyperkeratosis observed in Sema4AKO mice is due to aberrant mTOR signaling rather than STAT3 activation. STAT3 may be relevant to other pathways independent of Sema4A signaling, or it may function as a complex with other molecules in the Sema4A signaling.

      Author response image 4.

    1. Author response:

      Response to Reviewer 1

      We will investigate the intracellular localization of ABCA1 in both EpH4 and EpH4-Snail cells. We will also examine the changes in ACAT expression levels within these cell lines.

      Response to Reviewer 2

      We will first investigate whether the chemoresistance exhibited by EpH4-Snail cells can be abolished not only through pharmacological inhibition of ABCA1 but also by knocking out the ABCA1 gene. Regarding causality, as demonstrated in Figure 2, we have already shown that reducing cholesterol levels in EpH4-Snail cells decreases ABCA1 expression. To further explore this relationship, we will assess whether increasing sphingomyelin levels by adding ceramide to the culture medium, thereby correcting the sphingomyelin-to-cholesterol ratio, would reduce ABCA1 expression. Furthermore, we will evaluate whether lowering cholesterol levels in EpH4-Snail cells via simvastatin treatment, along with normalization of the sphingomyelin-to-cholesterol ratio, attenuates their resistance to the anticancer drug nitidine chloride. Additionally, we will incorporate quantitative analyses for several experiments, as suggested in the reviewers’ comments, to enhance the robustness of our findings.

    1. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well. 

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify. 

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois. 

      Strengths: 

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition. 

      Weaknesses: 

      The authors rely on the expression of a small number of key regulatory genes to interpret the developmental defects. The alternative possibilities remain to be ruled out thoroughly. The manuscript is also quite descriptive and would benefit from more focused highlighting of the novelty regarding the absence of Tgfbr1 in the mouse embryo. They should also strengthen some of their conclusions with more details in the results.

      Although we used a limited number of key regulatory genes to interpret the phenotype, these genes were carefully chosen to focus on specific processes involving the lateral mesoderm, its derivatives, and the endoderm. In addition to these markers, we included references to other relevant markers that were previously analyzed and initially led us to examine the lateral plate mesoderm and tail gut in Tgfbr1 mutants. To strengthen our analysis, we have now incorporated additional data to clarify specific phenotypes. For instance, in situ hybridization (ISH) for Shh further confirms abnormalities at the caudal end of the endoderm in mutant embryos, while no endodermal defects are observed in the trunk region. We also included an analysis of the intermediate mesoderm, which shows abnormalities at the same level as those found in the lateral plate mesoderm and endoderm of Tgfbr1 mutants.

      It’s important to note that using additional markers to assess the epiblast/primitive streak of Tgfbr1 mutants at E7.5–E8.5, as suggested by a reviewer, is unlikely to yield new insights. At these early stages, Tgfbr1 mutant embryos do not display observable phenotypes in the main body axis. Data in this manuscript already demonstrate the absence of abnormalities at this stage, as shown in Figure 3 and Supplementary Figure 6. Additionally, the expression of certain genes showing abnormalities when the embryo would enter tail development, in the trunk their expression remains unaffected, indicating that trunk extension is not significantly impacted by Tgfbr1 deficiency. While transcriptomic analysis of these Tgfbr1 mutants could provide interesting insights, it would be more appropriate to focus on later developmental stages, which would be beyond the scope of the current study.

      The second major critique was that the manuscript is primarily descriptive. We disagree with this assessment. Several hypotheses were rigorously tested using genetic approaches, including Isl1 knockout experiments, cell tracing from the primitive streak with a newly generated Cre driver to activate a reporter from the ROSA26 locus, and assessment of extraembryonic endoderm fate in Tgfbr1 mutants by introducing the Afp-GFP transgene into the Tgfbr1 mutant background. Additionally, we conducted tracing analyses of tail bud cell contributions to the tail gut via DiI injection and embryo incubation. To address potential concerns regarding this experiment, we have included data showing the DiI position immediately after injection to confirm that it does not contact the tail gut. We also considered and accounted for potential DiI leakage into neuromesodermal progenitors to clarify the endodermal results.

      Our genetic and DiI experiments were specifically designed to differentiate between alternative hypotheses and to confirm hypotheses generated from other analyses. Additionally, improvements in some of the imaging data have helped address remaining concerns.

      Reviewer #1 (Recommendations For The Authors): 

      I have listed my suggestions as queries. The authors may perform experiments or clarify by editing the text to address them. 

      The authors state on Page 11 and elsewhere that the ventral lateral mesoderm is absent in the Tgfbr1 mutant. What is the basis for this conclusion? Are there specific markers for PCM or GT primordium? 

      The specific marker of PCM and GT primordium is Isl1. The absence of this marker in the Tgfbr1 mutants is shown in (Dias et al, 2020). The reference is introduced in the manuscript.

      A schematic illustrating the VLM and the expression patterns of Tgfbr1, Gdf11, etc., would be helpful. 

      Characterization of Gdf11 expression has been previously reported (e.g. McPherron et al 1999, cited in our manuscript). It is expressed in the region containing of axial progenitors before the trunk to tail transition and not expressed in the VLM. As for Tgfbr1 expression is hard to detect, likely because it is ubiquitously expressed at low level. We include in this document some pictures of an ISH, including a control using the Tgfbr1 mutants to illustrate that the staining resembling background actually represents Tgfbr1 expression. If the reviewers find it important, we can also incorporate these data into the manuscript. Under these circumstances, we feel that a schematic might not be very informative.

      Author response image 1.

      Image showing an example of an ISH procedure with a probe against Tgfbr1, showing widespread and low expression. The lower picture shows a ventral view of a stained wild type E10.5 embryo.

      Foxf1+ cells in the 'extended LPM' of Tgfbr1 mutants suggest fate transformation, or does it indicate the misexpression of marker gene otherwise suppressed by Tgfbr1 activity? The authors suggest that Foxf1+ cells are VLM progenitors from posterior PS trapped in the extended LPM. Do they continue to express PS markers? 

      The observation that both in wild type and Tgfbr1 mutant embryos Foxf1 expression in the trunk is restricted to the splanchnic LPM indicates that the absence of this marker in the somatic LPM is not the result of a suppression of its expression by Tgfbr1. In wild type embryos Foxf1 is also expressed in the posterior PS, regulated independently of its expression in the LPM (i.e. Shh-independent) and later in the pericloacal mesoderm (our supplementary figure 2). As Foxf1 expression in the posterior PS was not suppressed in the Tgfbr1 mutants, together with the absence of pericloacal mesoderm, we interpret that the Foxf1-positive cells in the two layers around the extended celomic cavity in the posterior end of the mutant embryos derived from the posterior PS, resulting from the absence of its normal progression through the embryonic tissues.

      We did not find expression of PS markers giving rise to paraxial mesoderm, like Tbxt, further suggesting that those cells could derive from the restricted set of cells within the posterior PS that contribute to the pericloacal mesoderm

      For example, the misexpression of Apela is interpreted as mis-localized endoderm cells. They show scattered Keratin 8 misexpression to support the interpretation. It would be more convincing if the authors tested the expression of other endoderm markers. 

      As indicated in the manuscript, we suggest that these cells are endoderm progenitors (p. 13), like those present at the posterior end of the gut tube at E9.5 and E10.5, that are unable to incorporate into the gut tube. Apela is not a general endodermal marker: it is expressed in the foregut pocket and the nascent cells of the hindgut/tail gut, becoming down regulated as cells take typical endodermal signatures. The presence of ectopic Apela expression in the extended LPM of the mutant embryos might indeed indicate the presence of progenitors that failed to downregulate Apela resulting from the lack differentiation-associated downregulation. This would also implicate the absence of definitive endodermal markers.

      The Nodal signaling pathway in the anterior PS drives endoderm development. It acts through Alk7. Does Tgfbr1 (Alk5) mutation impact endoderm development, in general? It isn't easy to assess this from the Foxa2 in situ RNA hybridization shown in Figures 6A and B. It would be helpful for the readers if the authors clarified this point. 

      In the pictures shown in Figure 7D-D’ it is already shown that the endoderm is mostly preserved until the region of the trunk to tail transition. The presence of a rather normal endoderm in the embryonic trunk can also be seen with Shh, a figure added as Supplementary Fig.5.

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention two interesting novel points which they should develop in the discussion, and probably also in the results. 

      (1) The authors speculate about the possible involvement of the posterior PS as a mediator of Gdf11/Tgfbr1 signaling activity. However, as mentioned in the manuscript, their experiments do not allow regional sublocalization within the PS... Here it would be important to assess/discuss in more detail which progenitors respond to this signaling activity and when they do it. At the very least, the authors should provide high-resolution spatiotemporal data of the expression of Tgfbr1 in the PS. 

      Tgfbr1 expression at this embryonic stage does not give clear differential patterns. The data reported for this expression in Andersson et al 2006 is very low quality and we have not been able to reproduce the reported pattern. On the contrary, all our efforts over the years provided a very general staining that could even be interpreted as background. When we now included Tgfbr1 mutants as controls, it became clear that the ubiquitous and low level signal observed in wild type embryos indeed represent Tgfbr1 expression pattern: low level and ubiquitous. We are attaching a figure to this document illustrating these observations. If required, this can also be included in the manuscript as a supplementary figure. 

      Also, the work of Wymeersch et al., 2019 regarding the lateral plate mesoderm progenitors (LPMPs) should be referred to and discussed here. 

      This was now added in the results (page 11) and in discussion (page 16). 

      For instance, are the LPMP transcriptomic differences detected between E7.5 and E8.5 caused by Tgfbr1 signaling activity? This question could be easily answered through a comparative bulk RNAseq analysis of the posterior-most region of the PS of mutant and WT embryos. The possible colocalization of Tgfb1 (Wymeersch et al., 2019) and Tgfbr1 in the LPMPs should also be addressed. 

      We agree with the suggestion that RNA-seq in the posterior PS of WT and mutant embryos might be informative. However, it is very likely that within the proposed timeframe (E7.5 to E8.5) that there are no significant differences between the wild type and the Tgfbr1 mutant embryos because there is no apparent axial phenotype in Tgfbr1 mutant embryos before the trunk to tail transition. Therefore, at this stage, we think that this experiment is out of the scope of the present manuscript. 

      (2) The activity of Tgfbr1 during the trunk-to-tail transition is critical for the development of tail endodermal tissues. Here the authors suggest again the involvement of the posterior PS/allantois region, but a similar phenotype can also be observed for instance in the absence of Snai1 in the caudal epiblast (Dias et al., 2020)... It would be important to assess/discuss the origin of those morphogenetic problems in the gut. Is it due to the reallocation of NMC cells into the CNH? The tailbud-EMT process? LPMPs specification?... Regional mutations or gain of functions of Snai1 or Tgfbr1 in the caudal epiblast would help answer the question.  

      The endodermal phenotype in the Snai1 mutants is different to that observed in the Tgfbr1 mutants. As can be observed in Figures 3, 4 and 5 of Dias et al. the absence of tailbud is replaced by a structure that extends the epiblast. As a consequence, the endoderm finishes at the base of that structure, even expanding to make a structure resembling the cloaca, which is different to what is seen in the Tgfbr1 mutants. In this case, the lack of tail gut is likely to result either from the lack of formation of the progenitors of the gut endoderm or from the dissociation of what would be the tail bud from the LPM. Actually, hindlimb/pericloacal mesoderm markers, like Tbx4, are preserved in the Snai1 mutant. As for the gain of function of Snai1 experiment, already reported also in Dias et al 2020, the destiny of these cells is not clear. The ISH for Foxa2 showed extra signals but as it is not an exclusive marker for endoderm it is not possible to know whether any of these signals correspond to endodermal tissues.

      Regarding the development of tail endodermal tissues, the authors suggest that it occurs from a structure derived from the PS that is located posteriorly, in the tailbud, after the tip of the growing gut. This is an important and novel point as it suggests that the primordia of the endoderm is not wholly specified during gastrulation. So the observation should be well supported. How can Anastasiia et al. distinguish such "structure" from the actual developing gut? Does it have a distinct molecular signature or any morphological landmark that enables its separation from the actual gut? The data suggests that the region highlighted in Supplementary Figure 4Ab contains part of the actual gut tube (the same is suggested in Figure 5B). If the authors think otherwise, they must characterize that region of the tailbud by doing a thorough morphological and gene/protein expression analysis and assess its potency, via transplantation experiments. Also, the authors' claim mostly relies on the DiI experiments and those have three problems: #1 Anastasiia et al. assess "tail" endodermal growth at E9.5 when the correct stage to do it is after E10.5 (after tailbud formation). 2# Incongruencies, low number (only three embryos), and diversity in the results shown in Figure 8 and Supplementary Figure 4. For instance, despite similar staining at 0h, the extension and amount of DiI present in the gut tube after 20h varies significantly amongst the differently labeled embryos. A possible explanation lies in the abnormal leakiness of the DiI labelings and that is confirmed by the observations shown in Supplementary Figure 4M-O; the same for Supplementary Figure 4G, which shows a substantial amount of DiI in the neural tube. 3# The authors must provide high-quality data showing which tissues/regions were labelled at time 0h, including transversal and sagittal sections as they did for the 20h time-point. Additionally, it is important to re-orient the sagittal optical sections to a position that also shows the neural tube (like a mid-sagittal section) and include information concerning the AP/DV axis, as well as the location of the transversal optical sections in the sagittal image. 

      As described in the reply to reviewer 1, Apela is expressed in the nascent tail gut endoderm but not in more anterior areas except for a foregut pocket, and becomes downregulated as the tube acquires endodermal signatures. Therefore, the structure to which the reviewer refers to might indeed represent a group of progenitors that extend the tail gut. And the observation that this property is observed only in the tail gut as it grows, already separates this region of the gut, which in the end do not contribute to mature organs, from more anterior areas of the endoderm (essentially anterior to the cloaca) that will become a relevant tissue of the intestinal organs. Our DiI labelling experiment was aimed to test whether this pool of cells contributes to the gut but does not allow to determine the nature of those cells, a question that will require further research (discussed on p. 17) and we think is beyond the scope of the present manuscript.

      Regarding the labelling at E10.5, we agree that the tail bud in terms of NMCs is not completely formed, for example, at E9.5 the neuropore is not yet closed. However, we are more interested in regression of the epiblast, which is complete by E9.5. Injecting at E9.5 also has technical advantages for us, first, because in our hands earlier embryos grow better in culture, and second, because it is easier to inject in the tailbud at E9.5 because it is a little bit bigger than at E10.5. Therefore, injecting at E9.5 is less prone to technical artifacts due to injection inaccuracy and compromised growth in culture.

      We agree that the injected DiI could also leak into NMPs, which might be located in the same area. However, while this could result in labeling of the neural tube, it would not affect the interpretation of the finding of labeled cells in the tail gut. Indeed, the presence of this label in the gut epithelium indicates the presence of progenitors in the injected region of the tail gut. We added some considerations of this the possible leakage into the results section of the manuscript (p. 15). We thank the reviewer for drawing our attention to this issue. 

      We also now provide high quality data showing labelled tissue at 0h in Supplementary figure 8A-c’, higher magnification images in Fig. 8, and reoriented optical sections in Fig.6 and in Supplementary Fig. 7, including axis and location of the sections as suggested by the reviewer.

      Minor concerns/comments: 

      (1) The abstract is quite long, though this might be fine for this journal. 

      (2) In relation to the comment on the abstract, the manuscript needs an initial Figure descrbing the events that are described in the introduction. Otherwise, the manuscript will only be accessible to mouse embryologists.

      We have a figure summarizing the results at the end of the manuscript, we think that including similar figure in the beginning might be redundant. What we could do, if required, is to include this type of schematic as a graphical abstract.

      (3) The authors need to clarify what they mean when they use the following expressions "PS fate" and "fate of the posterior PS".

      I do not think that we have used such expressions. Indeed, they did not come out when we run a “find” in the word document. However, they would mean the tissue that would come out from them at later developmental stages.

      (4) The assessment of Isl1 expression in Tgfbr1 mutant and transgenic mouse embryos would be better indicative of their molecular relationship than a comparative phenotypic analysis. 

      These data have been reported in Dias et al 2020 and Jurberg et al 2013, both cited in the manuscript.  

      (5) The authors should explain or discuss what the upregulation of Foxa2 in the posterior end of Tgfbr1 mutants means.

      While an upregulation is apparent in the figure, looking at other pictures we cannot be sure of this being a significantly quantifiable up-regulation. We therefore removed the statement from the text.

      (6) What happens to the intermediate mesoderm during the trunk-to-tail transition? Is Tgfbr1 involved in the regulation of its development?

      We have tested this using Pax2 and added the relevant data in Supplementary Fig. 1 and described in the results.

      (7) The term "potential" should not be used during the description of DiI labeling experiments as this technique only assesses cell fate.

      Corrected

      (8) Some figures lack AP/DV axis information (e.g. Figures 6, C, and D).

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight into the structural basis for the pharmacology of G protein-coupled receptors.

      Weaknesses:

      Cholesterol may play a fundamental role in GPCR dimerization (as cited by the authors, Prasanna et al, "Cholesterol-Dependent Conformational Plasticity in GPCR Dimers"). Yet they do not use cholesterol in their simulations of the dimerization.

      We thank Reviewer #1 for the positive comment on mwSuMD.

      In the revised version of the manuscript, the section about the A<sub>2A</sub>/D2 receptors dimerization has been removed because largely speculative. We agree that the lack of cholesterol in those simulations added uncertainty to the presented results.

      Reviewer #2 (Public Review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      (1) Binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      (2) Molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      (3) Molecular recognition of the A1-adenosine receptor (A1R) and palmitoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      (4) The whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron;

      (5) The heterodimerization of D2 dopamine and A2A adenosine receptors (D2R and A2AR, respectively) and binding to a bi-valent ligand.

      The mwSuMD method is solid and valuable, has wide applicability, and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. The definition of the metrics is a user- and system-dependent process.

      The too many and ambitious case-studies undermine the accuracy of the output and reduce the important details needed for a methodological report. In some cases, the available CryoEM structures could have been exploited better.

      The most consistent example concerns AVP binding/unbinding to V2R. The consistency with CryoEM data decreases with an increase in the complexity of the simulated process and involved molecular systems (e.g. receptor recognition by membrane-anchored G protein and the process of nucleotide exchange starting from agonist recognition by an inactive-state receptor). The last example, GPCR hetero-dimerization, and binding to a bi-valent ligand, is the most speculative one as it does not rely on high-resolution structural data for metrics supervision.

      We praise Reviewer #2 for the detailed comment on the manuscript. In this revised version, the hetero-dimerization between A<sub>2A</sub>R and D<sub>2</sub>R has been removed. Also, results about GPCR case studies other than GLP-1R have been reduced and downgraded in importance to focus on the fundamental key points of the adaptive sampling method.  We agree that the consistency with cryoEM data tends to decrease with an increase in the complexity of the simulated process and involved molecular systems. While it is possible to approximate cryoEM results  our unbiased adaptive sampling technique finds its most interesting application in mechanistically unknown out-of-equilibrium processes rather than reproducing known experimental data perfectly. The simulated case studies we present showcase the versatility, speed and consistency of our adaptive method to explore energetically unbiased transitions.

      Reviewer #3 (Public Review):

      Summary:

      In the present work, Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between loops of GPCR and G proteins, which are not resolved experimentally, or the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      In its current form, the manuscript seems immature and in particular, the described results grasp only the surface of the complex molecular mechanisms underlying GPCR activation. No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are a reproduction of the previously reported structures.

      We thank Reviewer #3 for the positive comment on the work. The revised manuscript focuses more on the GLP-1R and Gs case studies. We believe it addresses the weaknesses raised by showing the behaviour of key structural motifs and providing new hypotheses about GDP release.  

      Reviewer #2 (Recommendations For The Authors):

      In this methodological report, Deganutti and co-workers propose an improved version of supervised molecular dynamics (SuMD), named multiple walker SuMD (mwSuMD). Such an adaptive sampling method was challenged in simulations of complex transitions involving GPCRs, which are out of reach by classical MD.

      Although less energy-biased than other enhanced sampling methods, mwSuMD requires knowledge of the atomic detail of the ligand-protein or protein-protein binding site/interfaces and the structural hallmarks of the states whose conversion the method is going to address. Such knowledge is, indeed, necessary to define the supervised metrics (e.g. distances, RMSD, etc), which is a user- and system-dependent process.

      We classify mwSuMD as an adaptive, rather than enhanced, sampling method as it does not use any energy bias. We agree with the Reviewer that some knowledge of the system is required to productively set up the simulations, but this is the case for almost any MD advanced methods.  

      The text requires improvement in the essential methodological details and cleaning of those parts is not properly instrumental in method validation.

      While attempting to prove the widest possible applicability of the method, the authors exaggerated the number of examples, which, in spite of the increasing complexity were only summarily described. Please, limit the case studies to AVP binding/unbinding to V2R and the whole process of GDP release from membrane-anchored Gs following activation of GLP1R by danuglipron. The latter case, indeed, involves small ligand binding (danuglipron), small ligand dissociation (GDP), receptor activation, and activated receptor binding to membraneanchored G protein and G protein conformational transition instrumental to nucleotide depletion, which is already too much. In this framework, the cases of Gs-β2AR and Gi-A2R recognition are redundant. Most importantly, the case of D2R-A2AR heterodimerization and binding to a bi-valent ligand must be eliminated. The reason is that the case is not entirely based on the mwSuMD and the biased protein-protein interface does not rely on highresolution data (i.e. no structural model of D2R-A2AR dimer has been determined so far). Last but not least, the high intrinsic flexibility of the bi-valent ligand adds further indetermination to the computational experiment. Being too speculative, the case-study does not serve to model validation.

      We thank the Reviewer for the suggestion. In the current revised form, the manuscript focuses on AVP binding/unbinding to V2R and the GLP-1R activation, Gs recognition and GDP release.

      While eliminating the three case studies mentioned above, the remaining ones should be described more extensively and clearly, highlighting the most productive setup for each system. Incidentally, listing the performance parameters (e.g. distribution mode and minimum RMSD) of each simulation setting in Table S1 is worth doing.

      More accuracy in the methodological description is needed.

      As for the supervised metrics, the rationale behind the choice of a particular index and whether it is the outcome of a number of trials must be declared and the selected indices must be better defined. Here there are a few examples.

      AVP-V2R case. It is not clear why the AVP centroids were computed on residues C1-Q4 (I suppose the Cα-atoms) and not on the Cα-atoms of the whole cyclic part (C1-C6). Along the same line, the choice of the Cα-atoms of four amino acid residues to compute the receptor binding-site centroids requires justification.

      We have amended the text to clarify that all the heavy atoms of AVP residues C1-Q4, which are anticipated to bind deep into V<sub>2</sub>R, were considered alongside V<sub>2</sub>R residues part of the peptide binding site (Cα atoms only). From our experience, the choice of including side chains or not for the definition of centroids usually does not affect the supervision output. It should only affect the output of mwSuMD simulations based on the RMSD which considers the specific relative distance from the reference. However, a benchmark of the differences produced by divergent selections is beyond the scope of the present work.

      GLP1R case. The statement: "Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R" is rather weak as a strategy to manually (?) define the input position of the ligand.

      As stated in the manuscript, placing the agonist in that position was driven by preliminary 8 μs of classic MD simulations that pointed out the possible path for binding.  We agree with the Reviewer that there is still some degree of arbitrarity in it and for this reason, we have not presented structural details of the F06882961 binding path.

      As for the supervised metrics, what does it mean "the distance between the ligand and GLP-1R TM7 residues L3797.34-F3817.36"? Was the distance computed between ligand and L379-F381 centroids? Also: "In the supervised stages, the distance between residues M386-L394 Gas of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 was monitored" was it an inter-centroid distance? Furthermore, "supervising the distance between AHD residues G70-R199 Gas and K300-L394Gas" was it the distance between the centroid of the AHD and the centroid of the C-terminal half of the Ras-like domain? In general, when more than two atoms are involved in distance calculation, please, specify if the distance is inter-centroid.

      Also: "During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Ca atoms), were supervised" was the RMSD computed without superimposing the ligand to estimate its roto-translations?

      We have added details about the selections used for computing centroids throughout the methods section. For example, all the heavy atoms of F06882961 and the Ca atoms of L379-F381 were considered. RMSD values during GLP-1R activation were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Ca atoms). This now has been specified in the text.

      The authors considered the 7LCJ GLP1R-danuglipron complex as a fully active reference state instead of considering the receptor from a ternary complex with Gs. The ternary complex (7LCI) was indeed considered as a reference only in simulations of receptor-G protein recognition. 

      7LCJ and 7LCI are both fully active states. The main difference is that in 7LCJ, Gs coordinates were not deposited. Indeed, their RMSD computed on the TMD Ca atoms and F06882961 is 0.63 Å and 0.54 Å, respectively.

      Most importantly, the ternary complex chosen by the authors is not adequate as a reference for simulating the "opening" of the AHD because it bears a miniGs, hence, missing the AHD. In that framework, such an opening is rather vague and was not properly supervised by mwSuMD. The authors must repeat metrics supervisions by using, as a reference, the 6X1A ternary complex, which bears a displaced AHD. This would likely lead to a different path of GDP release.

      To the best of our knowledge, there is no evidence that a specific open conformation of the AHD is linked to GDP release. In support, we note that in GPCR ternary complexes, the AHD is usually not modelled because of its high flexibility. The only body of evidence we are aware of is that AHD must open up to allow GDP release. For this reason,  we decided to supervise the distance between AHD and the Ras domain without using a reference.

      In the statement: "The AHD opening was simulated starting from the best GLP-1R:Gs binding mwSuMD replica" the definition "best binding" requires clarification.

      This has been amended, specifying that Replica 2 was considered the “best replica” due to the closed deviation to the cryoEM structure.

      As for the case study on β2-AR-Gs recognition, I strongly suggest to eliminate it. However, I'd like to make some comments. The sentence: "the adrenergic β2 receptor (b2 AR) in an intermediate active state was downloaded from GPCRdb (https://gpcrdb.org/)" is vague as it does not indicate what intermediate active state structure was used. Since the goal of the case study was to probe the method in simulating receptor-G protein binding, it would have been better to start with a fully active state of the receptor like the 4LDO structure, employed by the authors only to extract epinephrine.

      mwSuMD is designed to provide insights into structural transitions. We started from an intermediate active state of β2-AR in complex with adrenaline because resembling the most populated state stabilised by a full agonist according to NMR studies (DOI:10.1016/j.cell.2015.08.045); the fully-active β2-AR conformation is stabilized only after Gs binding. However, following the Reviewer’s suggestion, we have reduced the presented results for the β2-AR-Gs recognition.

      Also in this case, it is not clear if the supervised receptor-G protein distance is between the centroid of the whole 7-helix bundle and the centroid of Gs α5. It is not clear why the TM6 RMSD concerned only the cytosolic end of the helix and did not include the kink region. With that selection, to estimate the outward displacement, RMSD should have been computed without superimposing the considered portion (once all remaining Cα-atoms of the receptors are superimposed).

      As the Reviewer pointed out above, some knowledge of the system is required to set up mwSuMD. Using more generic metrics as we did in this case, like the distance between the whole TMD and Gs α5 represents a general approach applicable to other GPCRs, that should allow orthogonal metrics to evolve independently from the supervision.

      As now specified in the text, the superimposition for RMSD calculation was performed on residues 40 to 140 Ca atoms, hence not considering TM6.

      As for the A1R-Gi recognition, as already stated, I strongly suggest eliminating it. However, I'd like to add some comments. I would discourage the employment of an AlphaFold model for simulations deputed to model validation in general and, in particular, when highresolution structures are available. In this case, the authors would have used the 1GP2 structure of heterotrimeric Gi no matter if from the rat species.

      Following the Reviewer’s suggestion, we have dramatically reduced the results presented for the A1R-Gi recognition. We considered 1GP2 for the simulations but H5 lacks the Cterminal six residues and therefore some extent of modelling was still necessary. However, we take the Reviewer’s comment on board and consider it for future work.

      Also, the palmitoylation and geranylgeranylation process is quite tortuous and it is not clear why the NVT ensemble was employed in the second stage of equilibration. This is reflected also on the GLP1R case study.

      We have amended the text to clarify this passage. The second NVT stage is required for stabilizing the G protein and its orientation in the simulation box. The figure below shows that a plateau of the Ca RMSD during the NVT step was reached after 700 ns for both Gi (black) and Gs (orange).

      Author response image 1.

      Here, it is not clear if the RMSD of α5 of Gi was computed with or without superposition.

      The RMSD of α5  was computed after superimposing on A<sub>1</sub>R residues 40-140 Ca atoms (the less flexible region of the receptor). We have now amended the text to report this information. 

      Reviewer #3 (Recommendations For The Authors):  

      Points to address:

      (1) Root Mean Square Deviation (RMSD) data are often reported as minimum values. It would be useful to provide the average value along the stable part of the trajectories. From the plots in Figure 2ab, it seems that the minimum values reported in the paper are very far from the average ones and thus represent special cases that are seldom reached during simulation. The authors should clarify this point;

      For the revised manuscript, we moved Figure 2 to the supplementary material and added average RMSD values for the most notable replicas in Figures 4e and S8a,b. As a reference, in the text, we now report RMSDs from our previous classic MD simulations (https://doi.org/10.1038/s41467-021-27760-0) of Gs:GLP-1R cryoEM structure (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>β</sub> \= 7.22 ± 3.12 Å; G<sub>γ</sub> = 9.30 ± 3.65 Å) which show how flexible G proteins bound to GPCRs are and give better context to the RMSD values we measured during mwSuMD simulations.

      (2) The RMSD values reported in the paper always refer to single molecules or proteins. It would be useful to also report the RMSD computed over the whole complexes (ligand/GPCR or GPCR/G protein). It would provide a better metric for understanding the general distance between the results and the reference experimental structures;

      We have now removed the results sections for A<sub>1</sub>R and β<sub>2</sub> AR to focus on GLP-1R, whose RMSD is analyzed in detail in Figures 2, 3 and 4.

      (3) A number of computational works investigated the GPCR/G protein interaction and these studies should be cited and discussed. Examples are the works from Mafi et al. 2023 (doi: 10.1038/s41557-023-01238-6), Fleetwood et al. 2020 (doi: 10.1021/acs.biochem.9b00842), Calderon et al. 2023 and 2024 (doi: 10.1021/acs.jcim.3c00805 and doi: 10.1021/acs.jcim.3c01574), Maria-Solano and Choi 2023 (doi: 10.7554/eLife.90773.1), Mitrovic et al. 2023 (doi: 10.1021/acs.jpcb.3c04897), and D'Amore et al. 2023 (doi: 10.1101/2023.09.14.557711). Many of these works focused on the activation of B2AR and the interaction with its G protein. In addition, Maria-Solano and Choi 2023 and D'Amore et al. 2023 also characterized the rotation of TM6 during the A1R and A2AR activation. Therefore, the claim "To the best of our knowledge, this is the first time an MD simulation captures the TM6 rotation upon receptor activation as results reported so far are largely limited to the TM6 opening and kinking55." is untimely;

      We thank the Reviewer for the suggested references. We have added them to the introduction as examples of energy-biased (Calderon et al. 2023 and 2024, Maria-Solano and Choi, Mitrovic et al., D'Amore et al) or adaptive sampling (Fleetwood et al) approaches to GPCR. Since the above articles focus on β<sub>2</sub>  AR and A<sub>1</sub>R, we do not discuss them in detail because the results sections for A<sub>1</sub>R and b<sub>2</sub> AR have been drastically reduced in the manuscript.

      We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy. However, we have removed the claim from the text.   

      (4) In the discussion section, the authors claim that a distance-based approach can be employed when the structural data of the endpoints is limited. However, the results obtained from the distance-based protocol during the validation of the approach, which was done using V2R as a reference, are unsatisfying, as acknowledged by the authors themselves. For instance, the RMSD mode value reported for the AVP C alpha atoms with respect to 7DW9 is high, 0.7 nm, whereas the minimum value is 0.38 nm. In addition, some side chains are not oriented in the experimental conformation and might have a different interaction pattern with the receptor if compared with the experimental structure. Considering that in this case the endpoint is known, it is plausible that the performance of the method would degrade even further when data about the target structure is limited. In a real case scenario, the ligand binding mode is unknown and in such a case no RMSD matrix can be used. This represents the major concern of this study that is no prediction is provided, but only - rather inaccurate - reproduction of the known structural data;

      The goal of the first part of the work was to compare mwSuMD to SuMD to justify its application on ligand binding using a challenging case study like vasopressin. The general validation of the parent method SuMD as a predictive tool for ligand binding mode has been extensively reported over the years (a few examples: https://doi.org/10.1021/ci400766b ; https://doi.org/10.1021/acs.jcim.5b00702 ; https://doi.org/10.1038/s41598-020-77700-z) and fell beyond the scope of this work. 

      (5) In the discussion, the authors write "A complete characterization of the possible interfaces between GPCR monomers, which falls beyond the goal of the present work, should be achieved by preparing different initial unbound states characterized by divergent relative orientations between monomers to dynamically dock." It would be useful for the reader to refer to and cite here advanced computational approaches that allow a comprehensive sampling of GPCR dimerization independently from the starting conformation of the receptors. One example is coarse-grained metadynamics as shown in doi: 10.1038/s41467-023-42082-z;

      The A<sub>2A</sub/D<sub>2</sub receptors dimerization has been removed from the manuscript. 

      (6) In many cases, it is not reported how residues missing from the experimental structures used to model the proteins were reconstructed. This information is important, considering that the authors comment on the results of their calculations on addressing these regions, such as in the case of B2AR. Furthermore, the authors did not report how their initial models were validated. The authors should also explain why they did not model the IC loops of A2AR and D2R;

      In the current version of the manuscript, for V2R ECL2 and GLP-1R, we specify that we produced 10 solutions with Modeller and considered the best one in terms of the DOPE score. 

      The only receptor model used,  β<sub>2</sub> AR, is now presented as preliminary data focusing on Gs and avoiding any structural detail of the Gs recognition. 

      As reported above the A2A-D2 dimerization has been removed from the manuscript.

      (7) In several cases, the authors state that residues never investigated before play an important role in the interaction between different proteins. An example is provided on page 6 for the B2AR/G protein association. Since this claim is quite significant, it would benefit from validation, at least for further calculations such as in silico mutagenesis studies. Another example is at the end of page 10 where the authors report a hidden interaction between D344 and R385 that is pivotal for Gs coupling by GLP-1R. Is there other evidence supporting this result (previously reported literature data, conservation rate of these residues, etc.)?;

      We have removed the supplementary table reporting B2AR/G protein interactions to reduce speculations and added a reference that reports GLP-1 EC50 reduction upon mutation of position 344 to Ala (https://doi.org/10.1021/acscentsci.3c00063).

      (8) The authors should provide a deeper discussion about the conformational rearrangement of GPCR and G protein during the coupling. In detail, the conformational changes of microswitich amino acids of GPCR (e.g., PIF, NPxxY, inactivating ionic lock) and alpha helix 5 of G proteins should be discussed in relation to the literature data and experimental structures;

      We have removed the A1R and b2 AR results to focus on GLP-1R. Key structural motifs in the polar central network and TM6 kink are analyzed more in detail in Figure 3.

      (9) The chronology of the conformational changes of GLP-1R is arbitrarily chosen. During the simulation, the RMSD values reported in Fig. 3 are high and do not demonstrate the full accomplishment of the simulation of the activation process of the receptor;

      We agree with the Reviewer that the GLP-1R inactive to active transition was not fully accomplished, compared to other work on class A GPCRs.  Unlike class A, class B GPCRs represent a challenging system to work with in silico because inactive starting conformations (e.g 6LN2) are extremely distant from the active one (e.g 7LCJ, 7LCI or 6X18), as demonstrated in Figure S6 for GLP-1R. Here we report the first attempt to model a class B GPCR activation mechanism starting from the inactive state, and even if not fully achieved, we believe it represents state-of-the-art simulations for this class of receptors.

      (10) It would be helpful for the reader not familiar with the employed technique that the authors explain in one sentence in the main text the pros and cons of using multiple walkers instead of single walker SuMD;

      We thank the Reviewer for the excellent suggestion. In the Discussion, we have now commented that: “more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch”, while in the Methods we explain that “mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, mwSuMD requires multiple GPUs to be the most effective, although any multi-threaded GPU can run more walkers on the same hardware keeping the sampling variety.”.

      Minor points to address:

      (11) Page 3: the following sentence is duplicated (also found on page 2) "GPCRs preferentially couple to very few G proteins out of 23 possible counterparts";

      (12) Page 20: Figure S13 refers to the QM validation of PF06882961 torsional angle, not to the image of the receptor conformational changes, which is instead Figure S14 (please correct figure caption).

      We thank the Reviewer for the accurate reading of the manuscript. These typos have been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Overall authors’ response

      We would like to thank the 3 reviewers for a thorough critique of our manuscript, and acknowledging the novelty and importance of our studies, in particular the relevance to collagenrelated pathologies such as idiopathic pulmonary fibrosis and chronic skin wound. We appreciate that there are shortcomings in these studies, as highlighted by reviewers; we have rewritten parts of our manuscript to clarify any misunderstandings, and conducted additional experiments to address concerns raised by reviewers (please see below red text within each response), which have been incorporated into our revised manuscript (modified text highlighted in yellow in revised manuscript). We believe that the revision had made our manuscript stronger in support of our original conclusions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B, and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans. 

      Strengths: 

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative. 

      Weaknesses: 

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis.

      We thank the reviewer for pointing this out. Macropinocytosis or phagocytosis could be modelled using high molecular weight dextran, and we have used fluorescently-labelled dextran to investigate potential co-localisation with exogenous collagen to investigate the involvement of these mechanisms in addition to endocytosis, and showed very little co-localisation (revised Figure S2B, lines 123-126). Further, we have performed a competition experiment where unlabelled collagen was added in excess at the same time as labelled collagen and showed that excess unlabelled collagen led to a retention of labelled collagen at the cell periphery (revised Figure S2C, lines 126-129). This is suggestive of collagen-I uptake utilises a different pathway to dextran (i.e. fluid-phase endocytosis) and is a receptor-mediated process.  

      It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI.

      We agree with the reviewer that the intracellular destination of ColI is very interesting, which is what the current Chang lab is investigating, although we believe the research findings fall out of scope for the revised manuscript here. However, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments using GFP-tagged Rab5 constructs (revised Figure 1D, Figure S6A).

      The circadian regulation does not appear as robust as the authors' last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils.

      The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B, and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39.

      We would like to clarify that we meant the post-Golgi compartment. We did not mean VPS33b/VIPAS39 as an endosome marker; however as we see collagen entering the cell in intracellular compartments, which is then recycled, we take that as convention, the endosome would be involved. This is further supported that we see some colocalisation with the classic Rab5 endosome marker.

      There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text.

      We appreciate the comment and have modified overstatements in the revised manuscript as appropriate. As stated above, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments.

      Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

      We appreciate the concern raised here. This is precisely why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen which is not endocytosed being incorporated onto pre-existing fibrils. We have new data using flow imaging, which showed that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a more detailed methodology-based study which is under preparation, which will allow future studies to further dissect the collagen intracellular trafficking process, and thus is not included in the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors describe a mechanism, by which fluorescently-labelled Collagen type

      I is taken up by cells via endocytosis and then incorporated into newly synthesized fibers via an ITGA11 and VPS33B-dependent mechanism. The authors claim the existence of this collagen recycling mechanism and link it to fibrotic diseases such as IPF and chronic wounds. 

      Strengths: 

      he manuscript is well-written, and experimentally contains a broad variation of assays to support their conclusions. Also, the authors added data of IPF patient-derived fibroblasts, patient-derived lung samples, and patient-derived samples of chronic wounds that highlight a potential in vivo disease correlation of their findings. 

      The authors were also analyzing the membrane topology of VPS33B and could unravel a likely 'hairpin' like conformation in the ER membrane. 

      Weaknesses: 

      Experimental evidence is missing that supports the non-degradative endocytosis of the labeled collagen.

      We thank the reviewer for raising this. We would like to clarify that we do not think that all endocytosed collagen-I is recycled, but rather sorted in the endosome which determines the fate of endocytosed collagen. Interestingly, results from Kadler’s group has shown that blocking lysosome function (through chloroqine and bafilomycin) significantly reduced endogenous collagen fibril formation (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), suggesting a nondegradative role for lysosome in fibrillogenesis.   

      The authors show and mention in the text that the endocytosis inhibitor Dyngo®4a shows an effect on collagen secretion. It is not clear to me how specific this readout is if the inhibitor affects more than endocytosis. This issue was unfortunately not further discussed.

      We thank the reviewer for this comment and have included in discussion the specificity of Dyngo4a (revised manuscript lines 383392). The ponceau stain suggests that Dyngo4a treatment did not affect global secretion and thus the effects are specific to collagen-I (Fig 2B).

      The authors use commercial rat tail collagen, it is unclear to me which state the collagen is in when it's endocytosed. Is it fully assembled as collagen fiber or are those single heterotrimers or homotrimers?

      We apologise for the confusion and will clarify in our revision. These would be single helical trimers from acid-extracted rat tail collagen. We have performed additional light scattering and CD spectra to confirm the molecular weight and helicity, and confirm that adding fluorescent tags did not alter the readout. We have included this in the revised manuscript (revised Figure S1A-C, manuscript lines 82-86).    

      The Cy-labeled collagen is clearly incorporated into new fibers, but I'm not sure whether the collagen is needed to be endocytosed to be incorporated into the fibers or if that is happening in the extracellular space mediated by the cells.

      We appreciate the concern raised here, which is also raised by reviewer 1. As answered above, this is why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen being incorporated onto pre-existing fibrils. We also have new data using flow imaging, which shows that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a methodology-based manuscript which is under preparation, thus will not be included in the revised manuscript.  

      In general for the collagen blots, due to the lack of molecular weight markers, what chain/form of collagen type I are you showing here?

      Apologies for the lack of molecular weight markers, it was an oversight by the authors and have been included in the revised figures.  

      Besides the VPS33B siRNA transfected cells the authors also use CRISPR/Cas9-generated KO. The KO cells do not seem to be a clean system, as there is still a lot of mRNA produced. Were the clones sequenced to verify the KO on a genomic level?

      Yes, the clones were verified and used in our previous paper on circadian control of collagen homeostasis. There are instances where despite knockout at the protein level, mRNA is still persistent; however these transcripts are likely then directed to degradation through nonsense-mediated mRNA decay. To fully understand this mechanism is beyond the scope of this paper. 

      For the siRNA transfection, a control blot for efficiency would be great to estimate the effect size. To me it is not clear where the endocytosed collagen and VPS33B eventually meet in the cells and whether they interact. Or is ITGA11 required to mediate this process, in case VPS33B is not reaching the lumen?

      This is an interesting question. We have conducted experiments with Col1-GFP11 containing conditioned media incubated with VPS33b-barrell in the revised paper, which showed that they interact within the cell and not at the cell periphery (revised Figure 6G, lines 293-296), again highlighting that VPS33b is not involved in the endocytosis step but interacts with endocytosed collagen-I intracellularly. We have attempted colocliasation studies using the split GFP approach with VPS33B and ITGA11 to investigate where they interact, but as the ITGA11 construct we used did not localise to the cell surface as expected, we are not confident that this system is appropriate for investigating how/if VPS33B interacts with ITGA11, and there are simply no good antibody for VPS33B for staining. 

      The authors show an upregulation of ITGA11 and VPS33B in IPF patients-derived fibroblasts, which can be correlated to an increased level of ColI uptake, however, it is not clear whether this increased uptake in those cells is due to the elevated levels of VPS33B and/or ITGA11.

      We would like to clarify here that we do not think collagen-I uptake is due to VPS33B and/or ITGA11, as siITGA11 and VPS33B in fibroblasts showed no consistent changes in uptake as determined by flow cytometry, which was included in the original manuscript (now revised Figure 6H, 7I). VPS33B and ITGA11 are involved in the ‘outward’ arm of recycled collagen-I, i.e. directing to fibrillogenesis route. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript, and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Reviewer #3 (Public Review): 

      Summary: 

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). Finally, the authors demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. 

      Strengths: 

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process. 

      Weaknesses: 

      Throughout the study, several different cell types have been used (immortalised tail tendon fibroblasts, NIHT3T cells, and HEK293T cells). In general, it is not clear which cells have been used for a particular experiment, and the rationale for using these different cell types is not explained. In addition, some experimental details are missing from the methods.

      We thank the reviewer for pointing out the lack of clarity, and have filled in missing information in the methods. HEK293T cells were used for virus production for the VPSoe system, and we have clarified the cell types used in figure legends (predominantly iTTF). We have also provided justification when NIH3T3 cells were used (revised lines 290-291).    

      There is also a lack of functional studies in patient-derived IPF fibroblasts which means the link between endocytic recycling of collagen and the role of VPS33b and ITGA11 cannot be fully established.

      We thank the reviewer for this comment, which was also raised by reviewer 2 above. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The authors inhibit Clathrin-dependent endocytosis with dyngo4a. It is well known that this inhibitor is not highly specific for this pathway. It is also not explained why the authors only inhibit the Clathrin uptake pathway, and not pinocytosis or Clathrin-independent endocytosis too. The authors refer to papers that describe pinocytosis for collagen endocytosis.

      We thank the reviewer for raising this question. Based on the fact that inhibition of clathrin-dependent pathway does not completely abrogate endocytosis of collagen-I, we anticipate that other pathways are involved in mediating collagen-I uptake, although additional data suggested this is unlikely through fluid-phase endocytosis, and is receptor mediated (revised Figure S2B, C).  

      Where does the ColI go in the cell? Depending on the uptake pathway, it is likely to pass through endocytic carriers to endosomes, where it may be recycled to the PM or degraded. From the start, the authors describe the ColI as being in vesicular structures, however, the imaging data that this is based on is not co-labelled with anything to determine the potential structure/localisation. This is not done at any point in the paper, until IF is shown of ColI with VIPA39, however without the relevant controls, this IF is unconvincing, as the general pattern of ColI and VIPA39 as an endosomal marker are not classically recognisable. Additionally, VPS33B is described as a late endosome/lysosome marker, which would have different connotations on ColI trafficking or destination than other types of endosomes.

      We thank the reviewer for pointing out the weaknesses in our original IF. We have included new confocal images showing labelled collagen co-localisation with GFP-tagged Rab5 through transient transfection, which is a more traditional endosome marker (revised Figure 1D, Figure S6A).  

      We are currently characterising the compartments to where ColI is trafficked to, which is being prepared as part of a methodology-based manuscript. We believe that this characterisation would be too detailed to be included in a revised version of this manuscript. The Kadler lab also have data suggesting that the lysosome is involved in collagen fibrillogenesis instead of its canonical degradation function, which is in another submitted manuscript (https://www.researchsquare.com/article/rs-1336021/v1). It was not included in this manuscript due to our focus (i.e. endocytic-recycling).   

      In Figure 5H, the pattern of Cy5-ColI staining looks like it could even be ER/Golgi in the VPSKO zoom panel, but in the absence of co-labelling, we cannot conclude anything. In order for the authors to conclude that ColI is within the endosomes, co-labelled If should be performed to demonstrate ColIendosomal colocalization. Likewise for the role of VPS33B in ColI fibrillogenesis: dependence of the process is demonstrated, but the relationship is not defined. This could be clarified using IF. This would also support the authors' statements of co-trafficking between ColI, VPS33B, and VIPA39, which as the paper stands, is not demonstrated.

      We would like to clarify that our hypothesis is that the endosome controls how collagen is being deposited outside the cell, i.e. whether it’s protomeric secretion or fibrillogenesis, and that the decision of whether an endocytosed collagen is recycled or degraded lies in this compartment. The reviewer is correct that it may not be just the endosome that endocytosed collagen-I ends up in, as we have new data suggesting involvement of other intracellular compartment, although the detailed mechanism is beyond the scope of this manuscript. Nonetheless, we have included new data showing co-localisation of endocytosed collagen with Rab5 in this revised manuscript (revised Figure 1D, Figure S6A).  

      The basis of this paper is that endocytosis of ColI must occur before re-exocytosis as fibrillar ColI. The authors show this through pulse-chase experiments, with a trypsinisation step to remove any externally bound ColI. The authors also show nice time progression by flow cytometry, but it would truly demonstrate this point if they showed 0 timepoint, or low timepoint of IF to show progressive lengthening of ColI fibrils. This is used early on in Figure 1D, although the presentation here is not very clear. This is especially important as the authors address the self-seeding capabilities of Collagen in cell-free conditions in Figure 1F.

      We would like to thank the reviewer for this suggestion.  From previous endogenously tagged collagen data, we know that the appearance of collagen fibrils is rather rapid, thus it may not be a gradual lengthening as expected, but rather a depletion of endocytosed collagen in the initial seeding/growth step (please see https://www.researchsquare.com/article/rs-1336021/v1). We have included an image of replated fibroblasts after 18 hours showing no appearance of extracellular collagen, endogenous or otherwise (revised Figures S2A, line 110).  

      Finally, although the involvement of ITGA11 is interesting, it is not well described, and its role is not well demonstrated. This could likely be clarified by an additional introduction to ITGA11 and its role in collagen exocytosis/fibrillogenesis.

      We would like to thank the reviewer for pointing this out and have included additional sentences to specifically introduce ITGA11 and its role in fibrillogenesis (see lines 320, 321; 446-450).  

      Specific points: 

      Line 73: You haven't compared reuse vs production, so you can't say that reuse is central rather than production. They may be both as important or production still may be the most crucial, maybe it depends on cell/collagen type. Using the ColI KD or CHX to block nascent synthesis, you could directly compare the impact of both.

      We would like to clarify that we are not referring to reuse/recycling here. We meant that production of collagen (i.e. single hetero/homotrimer molecules within the cell) is not as crucial as the utilisation (i.e. are these being secreted as protomers, or assembled into fibrils) of these building blocks by the cells, which was supported by our finding that production (as suggested by mRNA levels) of IPF fibroblasts are similar to that in control fibroblasts (now revised Figure 8A). We have conducted ColI siRNA to block nascent synthesis in the original manuscript and showed that fibroblasts can efficiently make new fibrils by recycling exogenous collagen (Figure 3B, C), although we appreciate that siRNA may not completely inhibit endogenous production. Thus, we have also included new data using collagen-I knockout cells to support our hypothesis that without endogenous production, fibroblasts can still effectively make collagen fibrils if they can reuse what is available in the extracellular space (revised Figure 4, Figure S3C, D; lines 178-199).  

      Lines 83-87: The rationale for this experiment is not clear. Cy3-ColI is added, taken up into cells, and incorporated into fibrils coming from cells. 5FAM-ColI is added at a later stage, then at 2 days (when incorporation is demonstrated in Fig 1B), it is also incorporated into cells as expected. Why does this comment on ColI not being degraded any more than Cy3-ColI alone?

      We believe that the pulse chase experiment using the differently tagged collagen demonstrated a dimension of dynamics that is not demonstrated with Cy3-ColI alone. In this case, Cy3-ColI was initially added, and removed after 3 days; 5FAM-ColI is then added and incubated for 2 more days. Thus after 5 days since the initial pulse, the Cy3-ColI persisted and was not degraded. We would like to apologise for causing this confusion, and have clarified in the revised manuscript (lines 542-549; Figure S1D figure legend).  

      Figure 1A: I would like to see a negative control: either dark colI or no Cy3-Col, or timescale. Is B quantified from these images?

      We thank the reviewer for this comment. We have added the nocollagen control image in our revision (revised Figure S1D). 1B is not quantified from the ex vivo tendon experiments, but rather the in vitro cell culture experiments (i.e. those from 1D-1F, although they are all from independent experiments).  

      Figure 1B: in iTTF cells (immortalised tendon cells) Corrected to max: What does that mean?

      As there are variations between individual experiments (e.g. changes in the amount of collagen added due to pipetting) we have normalised to the maximum value obtained in each individual experiments so that we can display all biological repeats within the same graph.  

      Figure 1C: You can't say ColI is in vesicular structures from this, they are spots, yes, but that could also be in Golgi/ER (unlikely to be cytosolic but not impossible).

      We appreciate this comment and have change the wording accordingly and call them intracellular/punctate structures.

      Figure 1D: Not the best presentation: The cell mask has structures: what are these? It's not clear if this is a single cell, would be better with a defined marker (endocytic marker, lysosome etc). Instead of a low-resolution 3D view, it would be clearer with normal confocal XY and zooms of "vesicular structures" using appropriate markers as 3D reconstructions I think it could be removed.

      This is a single cell and the cell mask is staining plasma membrane. We didn’t use defined marker as we wanted to visualise the whole intracellular cell compartment. We appreciate that further proof is needed to verify the location of the endocytosed collagen, and have included additional confocal imaging data to support the localisation of collagen into Rab5 positive intracellular compartments (revised Figure 1D, Figure S6B).  

      Figure 1 E/F: Cy3 is only visible in extracellular structure, not also intracellular. Why? Would be useful to see the time points of incorporation at the end of the pulse, then at an early point into the chase, to demonstrate 1) Cy3-ColI uptake into cells and progressive incorporation rather than potential direct binding of ColI-Cy3 to ECM, or other non-specific factors. Showing the image at 0t would demonstrate an absence of external labelled colI and therefore its appearance later could be presumed that it had been internalised before.

      As the cells were trypsinized and replated after one hour labelled collagen feeding to ensure we are only tracking endocytosed collagen, t=0 in this case would be cells that are unattached. We have included t=18hr images post replate instead to show baseline level of collagen (revised Figures S2A, line 110).

      Figure S1A: yellow box: doesn't show only Cy3-ColI, there is red and yellow in the central cell, and large yellow blobs in the cell above. These images do not support this claim, including the Fiber Zoom box. They should also be shown in single channels to demonstrate the authors' points better.

      Apologies for the confusion – this is to show that newly added FAM5 Collagen is also co-localising with previously endocytosed Cy3-ColI, i.e. the Cy3-ColI is persisting rather than being degraded.  

      Line 92: endocytosed into distinct structures: These images are very vague, but I don't think you can call them distinct structures, all you can say from this is that they are spots.

      We have changed the wording to ‘distinct puncta’.  

      It is not clear why the authors use Cy3, Cy5, and 5FAM labelled colI. A brief explanation would be useful.

      Apologies for the confusion, we initially included our justification (to show that the fluorescence labels do not change the way collagen is internalised) but removed it in the final manuscript due to length. We have added the justification (revised line 101-102).   

      Figure 1F: It would be useful to see a quantification of the Cy3 channel here: I agree with the conclusions, and find the 0.5 ug/ml condition more convincing than 0.1 actually, although there is some feint Cy3 in cell-free samples there seems to be quite a big increase in the presence of cells, and this would look more convincing if quantified.

      We thank the reviewer for this suggestion and have included quantification in the revised manuscript (revised Figure 1G-I).  

      Figure 2B: Dyng is not an abbreviation of Dyng. Standardise Dyng/Dyngo/Dyngo4a. WB is soluble colI and represents little (if any) insoluble col. IF is more or less the other way round. How do they compare this?

      Thank you for pointing out the inconsistencies, we have corrected this in the revised manuscript. We took the conditioned media from the same experiment where cells are fixed for IF and carried out Western blot analyses. The IF showed some collagen still present, albeit significantly reduced. This is in agreement with the western blot results (i.e. Dyng4a inhibits both soluble and insoluble forms of collagen deposition).  

      Figure 2C: not an image series. Quant: no cells/independent exps and STATS?

      Apologies for the missing experimental details in figure legends, it should say ‘representative of N=3 experiments’. We are not sure what the reviewer meant by Figure 2C not being an image series, as we meant it to be an image series of the individual fluorescence channels. We have changed this terminology to avoid confusion, and have included statistical analyses in the methods section. The statistical analyses of the fibril quantification is next to the fluorescence images.  

      Figures 2D/E: The authors show that internalised ColI peaks at 20h and decreases to 60h, Fibers peak at 40h. How is this measured? ECM removed? Why would there be less in the cells, degradation? Whats the synchronisation?

      We apologise for omitting the synchronisation method in methods section, and have included in our revised manuscript (revised lines 542-544). This is through dexamethasone addition (and removal after 1hr incubation) as standard. The internalised Col-I is measured using Cy3ColI so the cells would have both nascent and external collagen. Total intracellular collagen at the different time points would likely be higher than represented as a result, but here we are demonstrating that internalisation is a rhythmic event using the external labelled collagen. Fibers are measured using standard IF and then fibril counting.  

      Please note that we are only overlaying the two graphs to form our hypothesis that endocytosis may be used for accumulation of collagen protomers that then allows for efficient fibrillogenesis. They are not directly comparable as the quantification are of different things (internalised Cy3-ColI, total collagen fibrils). We have clarified this in our discussion (revised lines 399-401).  

      Discussion: Where does the ColI go? Solubilised? Degraded? Taken up by other cells? 

      The inverse correlation is not very tight. In fact, at 38h where fiber count peaks, Cy3-ColI also peaks (esp in normalised data, Figure S2D).

      We thank the reviewer for this comment and have reworded our main text to reflect this, and included additional discussion in our revised manuscript (revised lines 401-404).  

      Line 123: What is the turnover rate of Fibrils? Don't know for how long the transcription has been done, or when this would affect the fibril number. You have the quant for Fn1, where is the quant for ColI?

      We have included the quantification of collagen-I in original Figure 2A. We appreciate that it might cause confusion in Figure 2C (as we co-stained ColI and Fn1 in the same experiment) we have removed the collagen-I panel from the revised Figure 2C. We know from previous results that the number of fibrils fluctuate over 24hour period, although the turnover of one specific fibril is unlikely going to be 24 hours (https://www.biorxiv.org/content/10.1101/331496v2)

      Line 124: no accumulation of col in extracellular space, but you don't know how much endogenous colI (or other endogenous ECM proteins) they're taking up as it isn't measured here. If the author wants to comment on this, should use either exogenous col to monitor take up and resection or block transcription/translation to show fibril formation endo/exocytosis independent of endogenous synthesis.

      This experiment has been done in the original manuscript – siCol1a1 experiment was done with two rounds of siRNA, first round is normal transfection followed by reverse transfection onto fresh coverslips (this will ensure no prior ECM is being deposited, see Figure 3). However we appreciate that there may still be low levels of endogenous collagen-I, and thus have included new data using collagen-I knock-out fibroblasts to strengthen our findings (revised Figure 4).  

      Line 142: Why is fibronectin synthesis also decreased in Col KD? This is clear in the image but no explanation/reference is given.

      Due to the dynamic and complex nature of ECM, it is unsurprising if there is a knockon effect when knocking down one matrix protein. However, we have quantified the amount of fibronectin fibril deposited by scr and siCol1a1 fibroblasts, and showed that there was in fact no significant change between the two treatments (revised Figure 3A).

      Figure 3A: Need labels for which colour/protein is shown. Needs quantifying, especially as the Fn1 decrease is not so obvious here, it is consistent between Figure 3A and 2C?

      We have provided quantification in the revision (revised Figure 3A). Figure 3A and 2C are two separate experiments (one is Dyngo treatment and one is siCol1a1), and neither showed significant changes in fibronectin fibril areas.   

      Figure 3B: Line 151: the text states that "The observation of fibrillar Cy3 signals in siCol1a1 cells showed that the cells can repurpose collagen into fibrils without the requirement for intrinsic collagen-I production (red arrow Figure 3B), however, there is clearly endogenous colI here too (along the fiber and also strongly at each end). Does the ColI antibody recognise the exogenous ColI?

      In our hands the ColI antibody does not recognise exogenous ColI, as the cell-free Cy3-ColI images were also stained with ColI antibody to ensure the two experimental conditions were treated exactly the same.

      This conclusion could only be made in the true absence of collagen: either in knock-out cells, or where collagen production/trafficking has been blocked (ie knockout of ColI chaperone or ERES block), or in a cell type that produces collagens but not ColI. Alternatively, if there are any fibrils seen that are completely negative, they should be shown in the figure and quantified (number of Cy3-ColI+-ColI+ vs Cy3-ColI+-ColI-).

      We thank the reviewer for this suggestion. We have included new data from collagen knock-out fibroblasts in this revision (revised Figure 4).  

      Figure S4A: the quality of this blot isn't very high, the result is not very clear and the high intensity (unspecific?) band below confounds the interpretation. In the author's previous paper (NCB 2020) the blots for VPS33B were much clearer, as is Fig S4D. It would be nice to include a clearer blot, maybe from the other repeats.

      This is the only blot that we used to select which knockout clones to use for our previous paper, which is why the quality is not as high. Knockout clones were all verified with additional western blots, and we do not think that endogenous VPS33b is expressed at high levels (also verified by MS analyses).  Fig S4D is overexpression of VPS33b, which is much easier to detect.  

      Figure S4D: This blot is much clearer, it would be useful to include a high gain to show the VPS33B band in CT to be able to understand the true increase.

      From the qPCR data one can see that the increase at mRNA is 20+ fold increase; we’ve always had problems trying to detect endogenous VPS33b using western blot or mass spectrometry analysis.  

      Figure 4A: The fibrils here in the CT are not obvious, and the difference between CT and KOs is not appreciable. Would this be clearer shown at a lower magnification, with zooms where needed? Or immunogold labelling/CLEM to label the ColI?

      It is not trivial to carry out immunogold labelling/CLEM. These are cell-derived matrices in culture and thus lower magnification may not show as many collagen fibrils as one would expect. We are not confident that lower magnification will provide more information as the characteristic D-banded collagen pattern will be lost.  

      Line 167/Figure 4B: It looks like there is more internal ColI in KO, but the images are not good enough to tell. This could be better shown by flow cytometry.

      We have previously seen that VPSKO leads to accumulation of collagen-I in intracellular punctas (NCB2020) which is also seen here. Flow cytometry data for internalisation of external collagen is already included in original Figure 5G (revised Figure 6H).  

      Again you mention intercellular vesicles, but based on these images, it is not possible to conclude this. These large spots could be aggregation elsewhere in the cell. Specific localisation should be shown by co-labelled IF/confocal, or it could be nicely shown by EM + fluorescent element (CLEM / Immunogold), or these statements removed from the text.

      We appreciate that the term ‘vesicles’ is very defined in the trafficking field, and have changed it to ‘intracellular compartments’.  

      Line 173-174 / Figure 4E: Why do you think the matrix mass is not increased in VPSoe by the approach shown in E when there is seemingly a huge increase by IF? E must also measure other ECM matrix proteins, which do you expect to be secreted by these cells? Could this confound the data if they too are affected by VPSoe?

      IF is showing specifically collagen-I. Hydroxyproline detects multiple collagens, and shows a trend of increase (although not significant due to one outlier). Matrix mass is a very generic measurement of total ECM deposited based on decellularized ECM weight. The reviewer is correct that VPSoe may also affect other ECM deposition, however here we are focussing specifically with its effect on collagen-I. How VPSoe changes other types of ECM deposition would be something that could be addressed in future studies and is not within scope of this manuscript.   

      Are the results in E paired?

      Individual values between control and VPSoe in each separate experiments are paired.  

      Figure 4F: Is quantification from IF shown in D? Specify which kind of microscopy it is based on.

      Quantification is based on fibril counting using standard fluorescence microscopy, as used in our previous paper. D is independent of F, as F is specifically looking at synchronised circadian effects, and D (and elsewhere) we are looking at global collagen deposition effects, irrespective of what time of day the cells are in.  

      Figure S5F: What do the yellow/red spots in the blots represent?

      We apologise for the initial unclear description of what the yellow/magenta circles depict in relation to the phosphoimages of the radiolabelled cell free translation products displayed in Supplementary Figure 5, panels F, G and I. These circles indicate non-glycosylated (yellow) and N-glycosylated (magenta) species respectively, as is now clearly descried in the revised manuscript.

      Figure 5 title: You can't conclude this from these images, need confocal and PM or cytosolic marker.

      We have changed the title to ‘VPS33B co-trafficks with collagen-I”. There is no good commercial VPS33b antibody for immunofluorescence staining, which is why we used the split GFP approach in this paper, and the images were acquired using confocal imaging (Olympus SpinSR system).  

      Figure 5E: The authors describe that ColI is in endosomes throughout most of the paper, and this is based on the involvement of VPS33B in the colI pathway. VPS33B is thought to be at the late endosome/lysosome. However, these images do not look like classic endosomes or lysosomes, or other normal organelle IF phenotypes. The fluorescent intensity looks saturated, and it is difficult to conclude anything from these images. It is unclear where in the cell the largest blob in the zoom would be localised and in which cell. I would suggest that this image is replaced and proper controls included (IgG controls and single channels) as well as using different markers for other potential intracellular structures.

      We appreciate the reviewers comment with regards to the classification of VPS33b localisation in the endosome compartment. We did not mean to use VPS33b as an endosome marker, as the focus of our studies are the function of VPS33b in directing endogenous or exogenous collagen to fibrillogenesis. With live imaging we could see endocytosed collagen moving in intracellular compartments, and have conducted additional staining to show co-localisation with Rab5 (revised Figure 1), which we take to indicate, through convention, that it is occupying an endosome compartment. We have included single channel images in the revised manuscript (revised Figure 6E).

      Line 255/ Figure 5G: no consistent change in uptake. Why are the results so varied in the KO and oe, here and in Fig 4C/E? N=4, what does that mean? 4 cells? 4 independent exps?

      In all cases, “N” represents independent biological experiments in this manuscript. Thus “N=4” in this case is 4 independent biological experiments, with at least 10,000 cells analysed per experiment. 

      We don’t know why there is a variation in response, however that is also why we concluded that it is unlikely that VPS33B is directly involved with collagen uptake. We have changed 5G (now revised Figure 5H) to a paired line graph for better representation.  

      Figure 5H shows the uptake of Cy5ColI. At this resolution, VP2ko looks like the col is ER, in one of the cells in the zoom, it looks like it is at Golgi. I think that the uptake route of ColI needs to be better defined, as there is no way to tell here where the colI goes. ColI being recycled/degraded would be most likely. But this figure looks like that might not be the case. It is also not clear where the zooms come from, they should be indicated with dashed boxes in the lower mag image

      We thank the reviewer for this comment, and agree that we need to define the uptake route of ColI. This is currently being assembled as a methodology manuscript, and how ColI is being recycled/degraded is one major research area of the Chang lab. 

      We have added dashed boxes in the lower mag images to indicate where the zooms derived from, and we would also like to thank the reviewer for pointing this out as we realised we have accidentally cropped the image to a slightly different area for the VPSko image, and have now corrected this.  

      Line 257: Based on this data, it could be trafficking through the cell as well as into the extracellular space.

      We think that VPS33B is involved in trafficking collagen through the cell to plasma membrane but not secreted, as based on our split-GFP experiment we never observed extracellular GFP signal, which suggests VPS33b is not deposited extracellularly.

      Line 259: "highlighting the role in recycling col to fibril formation sites" is an overstatement based on the data shown here, there is no data on colI trafficking or its regulation

      We respectfully disagree that we have not shown data on col-I trafficking or regulation by VPS33b – split GFP highlighted cotrafficking to the plasma membrane, and we have shown a clear relationship between VPS33b and collagen-I fibril formation, with minimal changes to collagen-I mRNA levels. We acknowledge that we have not shown specifically the location of VPS33b at fibrillogenic sites and have modified this statement in revised manuscript (revised line 302).  

      Line 262: "Having identified VPS33B as specifically driving collagen-I fibril formation" is also an overstatement.

      We refer here the data that VPS33b is not controlling collagen-I secretion (as demonstrated by the CM westerns) and specifically fibrillogenesis. We have clarified this in the revised text (revised line 304).  

      Line 286: It would be useful to have a brief intro to PLOD3.

      We have included a brief intro to PLOD3 in the introduction, as well as the results highlighted by the reviewer, in our revised manuscript (revised line 54-58).  

      Line 289/290: There could be other explanations for disruption to exo-endocytosis when disrupting col trafficking. Is VPS33B controlling exocytosis in general? Why should it be specific to col? Likewise with siITGA11 KD? Hypothesis for ITGA11 and fibrillogenesis?

      The relationship between ITGA11 and collagen fibrillogenesis is currently in a manuscript by Donald Gullberg and Cedric Zeltz, under revision at Matrix Biology (see reference 63 in revised manuscript). We do not think that VPS33b is controlling exocytosis in general, which is supported by the minimal change in ponceau stain of the western blots in the manuscript. Previously it has been shown that VPS33B co-trafficks with PLOD3, a collagen-I modifier.  

      Figure 6I: Why only quant Scr + siITGA11, not in VPSoe? It looks like there is still an increase in intracellular or fibril formation in VPSoe + siITGA11, which would be a key result to discuss.

      We would like to clarify that 6I (now revised Figure 7I) is on the endocytosis of exogenous collagen-I, not quantification of Figure 6H.  

      Line 307: Discuss fibrillogenic sites, what are they?

      As we have not shown direct evidence of VPS33B delivering endocytosed collagen at the site of fibrillogenesis, we have decided to alter the text to avoid overstatement, as suggested from previous reviewers’ comments.  

      Figure 8: What does pentachrome label?

      Pentachrome staining allows for simultaneous staining of multiple species: collagen in red, sulphated mucopolysaccharides in violet, red blood cells in yellow, muscle in orange, nuclei in green.

      Line 326: "In this study we have identified the endosome as a major protagonist in..." This is an overstatement and cant be drawn from this data.

      We have modified this statement to “In this study we have identified an endocytic recycling mechanism for type I collagen fibrillogenesis that is under circadian regulation”

      Line 330/331: "Collagen-I co-traffics with VPS33B in a VIPAS-containing endosomal compartment that directs collagen-I to sites of fibril assembly," This is also an overstatement that cannot be drawn from this data.

      We have modified this statement to “Collagen-I co-traffics with VPS33B to the plasma membrane for fibrillogenesis”.  

      Line 340: again, the demonstration of the involvement of the endocytic pathway is very limited.

      We have provided new evidence in the revised manuscript that support the involvement of classical endosomal compartments.  

      Line 366: You cant conclude this, you have not manipulated these proteins to show a functional effect or modulation of fibrillogenesis, it could still be a secondary effect.

      We have provided new evidence in the revised manuscript that supports this conclusion. 

      Line 569: "Unless otherwise stated, incubation and washes were done at room temperature." Which incubations? Specify if this is just post-fixation during the EM prep or during cell culture.

      This is specific to the EM preparation and we have clarified in the revised manuscript (revised line 663).  

      Small text alterations:

      Overall we would like to thank the reviewer for highlighting these errors and mistakes in our manuscript, and have corrected them in our revised manuscript.  

      Figure 1E: Fluoro image series? This is only one image.

      We wrote this to mean single channel images, we have corrected the terminology.  

      Line 111: Ref for Dyngo4a?

      We have included this in the revised manuscript  

      Line 121: introduction/abbreviation definition for Fn1? Instead it is on Line 140.

      Thank you for highlighting this, we have corrected this in revised manuscript.  

      Figure S2C: Alignment of labels cleaves x-axis.

      We thank the reviewer for catching this and have corrected this with our revised manuscript.  

      Figure S4F and G should be inverted to mention sequentially in the text.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.  

      Line 182: Figure 4J should be G.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.

      Line 209: typo: N-glycosylated.

      We have corrected this typo in our revised manuscript.

      Fig 6E: Very big as a figure element compared to others.

      We have made this smaller in the revised manuscript to fit better with rest of the figure.  

      Line 313: Figure 7E not F.

      Thank you for spotting this, we have corrected it.  

      Line 555: Typo: Scraped.

      We have corrected this typo in our revised manuscript.

      Line 562: missing )

      We have corrected this typo in our revised manuscript.

      Standardise

      We thank the reviewer for spotting the mistakes below and have corrected in our revised manuscript.  

      Legends: Include numbers of repeats and STATs throughout. 

      Terminology: Dyng etc. 

      Scale bars: some included as editable lines, some with size on top, small/large etc.

      In certain cases we have positioned the scale bars in different regions of the figures to ensure no obscuring of the images.

      VPS33b v B. 

      Reviewer #2 (Recommendations For The Authors):  

      The authors can improve the experimental part of the manuscript the following: 

      -  For all the western blots please include molecular weight markers.

      We thank the reviewer for noticing this omission and have included molecular weight markers in the revised manuscript.  

      - Performing immunofluorescence and western blot analysis of endocytosed collagen -/+ inhibitors for lysosomal degradation (BafA1 or E64d+PepstatinA) in order to exclude endocytosis for degradation.

      We thank the reviewer for this comment, another paper from the lab has identified lysosome to be involved in collagen fibrillogenesis (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), thus  

      - Figure out how Dyngo4a is affecting Col1 secretion in the first place? Does it interfere with the secretory pathway. Alternatively, use a different model to block endocytosis (e.g. siRNA Dynamin).

      We thank the reviewer for raising this. The Dyngo CM blot for total ponceau stain (revised Figure 2B) showed minimal changes, which suggest that global secretion is not affected.  

      - Further characterization of the VPS33B / collagen vesicles by immunofluorescence containing markers for early, late, and recycling endosomes. Block endocytic recycling by depletion of either Rabs or e.g. EHD1.

      There are no good VPS33b antibody for staining. We have included images of GFP-tagged Rab5 co-localisation with labelled collagen-I (revised Figure 1D, Figure S6B).  

      - Further clarify the status of the VPS33B knockouts e.g. by sequencing. also provide a readout of the siRNA KD, besides the mRNA levels, since there the difference is not striking.

      The knockout cell lines were characterised previously in our 2020 paper, which is referred to in our revised manuscript. We have always had issues detecting endogenous VPS33b due to reagents limitations, which is why we resorted to mRNA as the key readout.  

      - Doing siRNA knockdowns and endocytosis inhibition in the IPF fibroblasts to further strengthen the link between elevated expression of VPS33B/ ITGA11 and increased collagen uptake.

      We thank the reviewer for suggesting these experiments. Due to limitations of the patient-derived fibroblasts (cell numbers and passage numbers) we had to prioritise experiments, and thus have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).  

      Reviewer #3 (Recommendations For The Authors): 

      Major points 

      (1) Choice of cells: Please provide a rationale for why each cell line was used, and make sure that it is clear throughout the manuscript which cell line was used for each particular experiment. The HEK293T cell line is also missing from the reagent table.

      We thank the reviewer for pointing out this omission, and have clarified in our revised manuscript which cell lines were used in each experiment. We used HEK293T to generate lentiviruses as described in the methods section.  

      (2) Missing information from methods. Experimental details are missing from the methods in several places, making it difficult for someone to replicate an experiment. For example, no details are given in the methods describing the explant culture of murine tail tendons (described in results lines 78100), and there are no details on how the skin samples were obtained or stained. Further, no ethical approval details are provided for the use of human skin tissue.

      We apologise for leaving the ethical approval details and skin sample collection out, this was an oversight and will be included in the revised manuscript. We have also included the method to how murine tail tendons were cultured ex vivo (revised lines 527-531, 546-553).  

      (3) Functional studies in patient-derived cells. To fully establish the role of VPS33b and ITGA11 in fibrotic diseases, functional studies including the knockdown/overexpression of these genes could be performed to establish if the same response is seen as in non-diseased cells.

      We agree that this will add much to the paper, and have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).

      Minor Points

      We thank the reviewer for pointing out these mistakes, and have corrected and included additional details in the revised manuscript.  

      (1) Lines 51-52. Wording of this sentence is unclear, please rephrase. 

      (2) Line 182. Should this be Fig 4G rather than J? 

      (3) Line 209. Correct spelling of glycosylated. 

      (4) Line 463. Incomplete brackets and details missing? 

      (5) Line 590. Correct tense - was rather than are. 

      (6) Line 593. Specify centrifugation speed. 

      (7) Line 619. Nuclei rather than nucleus. 

      (8) Ln 650. Statistical analysis - was normality tested? 

      (9) Figure 1e - Difficult to read labels for coll/DAPI.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Thank you for reviewing our manuscript. We hope that this score will be validated in other countries from now on.

      Strengths

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      Thank you for this comment. It’s exactly as you said. This is a limitation (Lines 326-327).

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      Thank you for this valuable comment. The predictive PEP score is not necessary for the excluded patients. The reasons were as follows. Biliary duct cannulation was not attempted in patients for whom it was difficult to identify the Vater papilla. The biliary tract was separated from the pancreas in patients with a past history of choledochojejunostomy, pancreatojejunostomy, or pancreatogastrostomy. PEP risk was thought to be low in these patients and patients who underwent bile duct cannulation via the choledochoduodenal fistula. PEP diagnosis is difficult in patients with acute pancreatitis, whose diagnosis is currently in progress. We added these explanations (Lines 98-106).

      (3) Many other studies, e.g., https://link.springer.com/article/10.1007/s00464-021-08491-1, https://pubmed.ncbi.nlm.nih.gov/36344369/, that have been published before discussing the same issue, so what is the new with this score?

      Thank you for raising the new reference written by Archibugi et al. in 2023. The novelty of our score is that it is calculated using the factors that are investigated before ERCP procedures. The study written by Archibugi et al. involved procedure time and cannulation attempts for PEP prediction. These two factors are unknown before ERCP procedures. Therefore, a preprocedural predictive risk model for PEP was not created before our study was performed. We added the content of the past study written by Archibugi and included the report as a reference (Lines 65-67, 73-74).

      (4) The discussion section needs reformulation to express the study's aim and results.

      Thank you for this valuable comment. I have rewritten the first paragraph of the discussion. In the paragraph, we showed that the study achieved the aim on the basis of the results (Lines 245-255).

      (5) Why did the authors select these items in their scoring system and did not add more variables?

      Thank you for this valuable comment. We selected the items listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. We added this description (Lines 123-126). The original references of the guidelines were cited in the first draft version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment1. Please revise these documents: copyright, disclaimer, ethics approval, consent to participate, consent for publication, data and material availability, competing interests, funding, authors' contributions, and acknowledgments.

      First, thank you for reviewing our manuscript. We have already described the required information in the “author information” section. The sentences containing this information were proofread in English.

      Reviewer #2 (Recommendations for the authors):

      Comment 1. It would be best if you did this study in a Prospective way for more validation.

      First, thank you for reviewing our manuscript. We have revised our manuscript according to your comments. It’s exactly as you said. These points are limitations (Lines 312-318, lines 326-327). We hope that future validation studies over wider geographic regions will prove our opinions.

      Comment 2. The model name should be Acronyum (the first letter of the five items in the risk model).

      Thank you for this valuable comment. Sorry, we could not create a memorable model name using the first letter of the five items.

      Comment 3. You say that you include the pre-procedure criteria that predict PEP. You state one of the items, pancreatic duct procedure. Do you mean it is a history?

      Thank you for this valuable comment. This means that the main purpose is the pancreatic duct. Therefore, the pancreatic duct procedure is listed as “planned pancreatic duct procedures” in Figure 2 (Lines 40-41, 231-234). When an unintended pancreatic duct procedure is performed, we can calculate the risk score by adding two points for “planned pancreatic duct procedures” (Lines 48-49, 247-250).

      Comment 4. Regarding calcification, do you mean chronic pancreatitis? It needs more clarification regarding its degree.

      Thank you for this valuable comment. We regard pancreatic calcification as a finding of chronic pancreatitis. Pancreatic calcification was defined as the degree that was confirmed by imaging, such as CT, MRI, and EUS. These definitions have been written in the first draft version (Lines 134-137).

      Comment 5. Why don't you include young age in the model? Your result found that age less than 50 is significantly associated with PEP.

      Thank you for this valuable comment. We selected the PEP risk factors listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. Age less than 50 years was listed as a PEP risk factor in the Japanese guidelines for acute pancreatitis. We added this description (Lines 123-126).

      Comment 6. There is an ancient reference, some of them in 1994,1996.

      Sorry for the old references. These references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 123-126).

      Comment 7. In the introduction, you say that the first score includes one of the items for PEP pain during the procedure. It is a little bit strange.

      Thank you for this comment. The first PEP risk score did not involve PEP pain but involved pain during the procedure (Line 68).

      Comment 8. We know that once ERCP is indicated, you justify the importance of the risk model, stating that if one or more risks are found, we can do EUS or PTD. It is not reasonable to abort the procedure in case of frequent pancreatic duct cannulation or cancel ERCP if pt has one or more risk factors.

      Thank you for this valuable comment. If ERCP is performed for high-risk patients, prophylaxes for PEP, such as procedures by experts, pancreatic stent placement, and NSAID suppository insertion, should be performed as much as possible (Lines 281-287, 308-311).

      Comment 9. Regarding ERCP pancreatitis criteria, does it include amylase 3t or lipase?

      Thank you for this comment. We used Cotton’s criteria for diagnosing PEP. Cotton’s criteria include hyperamylasemia (more than three times the normal upper limit) at least 24 hours after ERCP (114-116).

      Comment 10. It is well known that pr with functional biliary disorder has a high incidence of PEP; it doesn't need a manometer for diagnosis. It needs to be included.

      Thank you for this comment. Moreover, functional biliary disorders are difficult to diagnose before ERCP procedures (Lines 259-262). The factor that is not apparent before ERCP could not be included in the predictive PEP scoring system.

      Comment 11: What is gabexare and nafamost.

      Thank you for this comment, and sorry for our insufficient explanation. These compounds include gabexate masilate and nafamostat masilate, which are protease inhibitors. In some institutions, protease inhibitors are used as prophylaxis for PEP. We added “protease inhibitors” (Lines 138-139, Tables 1 and 2).

      Reviewer #3 (Recommendations for the authors):

      Comment 1. The sample size needs clarification.

      First, thank you for reviewing our manuscript. The sample size has been included in the “Methods” section (Lines 157-165).

      Comment 2. They need to be mentioned cause they depend on old references in discussion and background.

      Thank you for this comment. The previous references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 122-126). In the background and discussion, we added new recent references and information related to the references (Lines 65-67, 285-287, 291-295, 308-311).

      Comment 3. Case definition should be added to the methodology.

      Thank you for this comment. We added patient information. Please refer to the response against the eLife assessment, weakness, (2).

      Comment 4. Do you include all who met the inclusion criteria, or was there any random sampling technique?

      No, we did not use random sampling techniques.

      Comment 5. What is the value of comparing the development and validation groups? I do not think it adds anything new as if you want to exclude confounders. Has the comparison revealed that a confounder does exist? What was your point of view concerning that?

      Thank you for this valuable comment, and sorry for the insufficient explanation. The differences between the development cohort and the validation cohort are important because the goodness of fit for the score could be confirmed in significantly different groups. We added this explanation (Lines 197-199, 251-253).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Thanks for the great summary! Most of the wtf genes have been tested for meiotic drive phenotypes previously by Bravo Nunez et al. (2020; http://doi.org/10.1371/journal.pgen.1008350). The reference was cited in our original manuscript, and we added the details in the revised manuscript.

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      We will test the meiotic driver phenotype of the wtfC4 we constructed in S. pombe as suggested.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

      We did the rewriting as this reviewer suggested in the comments to authors.

      Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Thanks!

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Thanks for the great summary!

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      We will test the meiotic driver phenotype of the wtfC4 we constructed in S. pombe as suggested.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Thanks! Please see the following for our point-to-point response.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Thanks for the great summary!

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      We will generate growth curves for all the 25 wtf deletion strains. We will also provide detailed for wtf gene knockout. However, for 25 wtf genes, there are too many combinations for editing two genes, and it is technically challenging to knock out multiple wtf together. Nevertheless, our results suggest single wtf gene has little effect on the host fitness under normal condition.  

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      We will verify the expression of the chimeric genes, and test the phenotype of meiotic diver for wtfC4 in S. pombe.

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. ". Reporting a p-value of 0 is not appropriate. Exact P-values should be reported.

      We will report the exact p values in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reply to reviewer comments:

      (1) Given the interpretations of this study hinge on the specificity of the antibodies used in immune fluorescence, the authors should provide full western-blot images of all their antibodies in supplementary information. 

      The commercial antibodies have been validated by the provider. 

      Additionally, we did our own tests. Of note is that proper validation of any antibody is only possible by using a knockout mouse for each protein analyzed (i.e. for pPKA wt vs. pka ko mice). This is not possible, because we do not have all these knock-out strains. However, specific proteins like pPKA, pCAMKII, and pCAMKIV are known to be increased by a light pulse. We show by western blot that pPKA (Fig. 2a, b) and pCamKII (Fig. S2a, b) are increased in wt animals mirroring what we observed in the immunofluorescence. These results suggest that the signal is specific to these antibodies. We provide a full panel of western blots, including the other proteins studied by immunofluorescence such as pCamKIV, pCREB, CaV 3.1, and pDARP32 and show that they detect a protein of the expected size. Full Western-blots mentioned in the manuscript are shown in Supplementary Figure 7. Below are additional validations of antibodies used in the immunofluorescence experiments.

      Author response image 1.

      Author response image 2.

      (2) The explanation in the results section surrounding Fig. 4 seems to be specific for the representative trace rather than the group. Specifically, does the following statement apply to all the replicates?  " A Ca2+ transient was observed right before the light was given at ZT14 (Fig. 4b), which showed the same magnitude as those observed during and after the light stimulus". 

      If not this should be corrected.  

      We have replaced now Fig. 4b with an average trace of all experiments. The individual traces can be seen in supplementary figure 4d.

      (3) Are lines 236 -244 and figure 5A/B demonstrating shCDK5 being similar to no-calcium or EGTA conditions at the level of CREB not contradicting Figure 3 which argues that the reason behind the increase in CAMK-phosphorylation and pCREB following shCDK5 is increased basal calcium? If this is the case then why does removing the external calcium phenocopy shCDK5 in these cells? The authors need to clarify this and give an explanation. 

      (4) The authors should explain why they see an equivalent level (or more) of CREB activation, 5 minutes following forskolin activation in Ca2+-free condition (apparent in the case of shCDK5 and EGTA) in the FRET assay. Does this not imply PKA is the most likely candidate mediating this reaction at this stage? Given this interaction has been demonstrated in multiple (other) experiments including in vitro isolated enzyme experiments involving CREB and PKA (E.G. fig 6A in PMID: 2900470) an absence of p-PKA pulldown is not sufficient to justify the non-involvement of PKA (PMID: 22583753). This statement needs support in the form of positive data or acknowledging the limitations in the text (conditions, single technique, etc). 

      (5) The authors should better explain the fret pairs used in the experiments involving ICAP for the reader's benefit - a reduction in fluorescence as a function of CREB activation is non-intuitive.

      We answer all three questions (3-5) together since they belong to the same concept.

      (1) How FRET works.

      The Forster resonance energy transfer (FRET) technique is widely used to investigate molecular interactions between proteins such as CREB: CBP in living cells. We used a sensor called ICAP (an Indicator of CREB Activation due to Phosphorylation) published by Friedrich and colleagues in 2010

      (https://doi.org/10.1074/jbc.M110.124545). The sensor is composed of three different elements: 1) the KID domain of CREB containing the Ser-133, which is phosphorylated upon forskolin induction in our experimental setup, 2) the KIX domain of CBP, which is responsible for the dimerization with phospho-CREB and 3) a short linker that separates the KID with the KIX domain. KID is flanked by a cyan fluorescent protein (CFP), while KIX is flanked by a yellow fluorescent protein (YFP). When KID is not phosphorylated, the ICAP conformation allows CFP - stimulated by blue UV light - to transfer energy to YFP, producing FRET resulting in yellow light emission. Therefore, the ratiometric analysis FRET/CFP shows FRET > CFP. After a stimulus (forskolin), the serine-133 in KID is phosphorylated and KID can bind to KIX. The dimerization separates CFP from YFP, resulting in decreased FRET and increased CFP-dependent blue light emission (see Author response image 3 below). Therefore, the ratiometric analysis FRET/CFP shows FRET<CFP over time (usually within 20’ after the forskolin stimulus).

      Author response image 3.

      FRET model. On the left is a schematic representation of how ICAP works. On the right, an example of the quantified FRET decrease associated with increased KID: KIX interaction.

      (2) The ‘apparent’ contradiction between Figure 5A and Fig 3.

      As mentioned before, the chosen FRET method is ratiometric, meaning that a relative FRET signal in fluorescence is measured compared to the baseline (absence of forskolin, assay buffer). The FRET experiment can only tell whether there is a change in the phosphorylation state of KID during the live imaging comparing the baseline to the period after the forskolin treatment. The result produces a delta [ (time after forskolin)(baseline)]. The higher the delta, the more KID is phosphorylated after forskolin treatment. If KID phosphorylation is not increased compared to the baseline, the FRET signal tends to return to the baseline with a reduced delta [ (time after forskolin)-(baseline)]. Therefore, the experiment does not tell at the quantitative level the amount of KID (CREB domain) phosphorylation before the stimulus. It only tells whether after the stimulus the phosphorylation is increased producing or not a delta. This means that the lack of delta can be caused by: A) high KID phosphorylation in the baseline which does not further increase after the forskolin stimulus; B) very low KID phosphorylation in the baseline which does not increase after the forskolin stimulus. In Fig. 5A, wt cells (orange trace, lines, and double arrow) show a higher delta compared to the ko cells (blue trace, lines, and double arrow). The result indicated that the phosphorylation of CREB (KID domain) is increased after the forskolin stimulus only in the wt. To that extent, the results are in line with the experiment that we show in Figure 3. Indeed, the increased delta in CREB phosphorylation is observed only in the scramble animals, where it is lost in the ko (the blue double arrow indicates the delta in the scramble). 

      Author response image 4.

      (3) The FRET signal within 3 minutes after forskolin stimulation

      The signal mentioned by the reviewers at 5’ is an artifact given by the light diffraction promoted by the addition of Forskolin in DMSO which propagates through the plate. The same effect is observed in the only DMSO treatment (Fig.S5). Therefore, it needs not to be taken into account. The amplitude of this signal in this window of time is due to many independent variables (buffer composition, cell shape, room temperature, pipetting), therefore it is not possible to speculate any consideration about it. We never consider this time window for describing our results.

      Author response image 5.

      (4) Role of PKA and considerations about experiments performed in Fig. 5a and b

      To answer the question about the role of PKA, we believe it is a pivotal player. Our results indicate that PKA might promote CaV3.1, the entrance of calcium, and therefore, CAM Kinase pathway activation leading to CREB phosphorylation (Fig. 5). However, if the calcium is depleted, even a channel activation mediated by PKA cannot propagate the signal. For that reason, when we deplete calcium in wt cells as we do in the experiment performed in Figure 5B the activation of PKA alone cannot promote the CREB phosphorylation associated with a reduction of the FRET signal. As mentioned before, the FRET method gives a binary answer. It means either a higher or lower delta comparing time after forskolin to baseline. It cannot give stoichiometric info about the level of calcium and/or phosphorylation in the baseline. To that extent, the FRET experiment in Figure 5A cannot be connected to the experiment in Figure 5B. The method is the same, but the scientific questions are different. In Figure 5A we demonstrate that CDK5 plays a role in the PKA activation pathway. In Figure 5B we demonstrate that the general pathway needs calcium.

      We modified the text accordingly.

      (6) The presentation of the data in Figure 6 seems to be divergent from the rest of the data presentations. Please make it more consistent and also provide more explanations. Specifically, the authors suggest increased P-CREB nuclear localization (and an increase in phosphorylated PKA/CAMK) following shCDK5. Won't this lead to an increase in Per1, Dec1, cFos, and Sik1 basally (pre-light pulse)?

      We followed the reviewer's suggestion and present data in Figure 6 as done before in the manuscript. The reviewers should also consider our papers published before (Brenna et al., 2019; Brenna et al., 2021). In these papers, we demonstrate two important concepts that are in line with this manuscript. First, the lack of CDK5 promotes PER2 degradation and lack of nuclear translocation (Brenna et al., 2019). Second, PER2 plays a scaffold role in promoting the formation of the CREB transcriptional complex involved in the regulation of the expression of light-dependent genes (Brenna et al., 2021). Therefore, the take-home message here is that even if a lack of Cdk5 promotes a higher basal level of CREB phosphorylation, it also promotes PER2 degradation. Therefore, without PER2, the CREB-dependent gene expression is reduced. For this reason, we say that CDK5 gates phase shift (via PKA-CAM Kinases-CREB axis) of the circadian clock (via PER2).

      (7) The authors should discuss why calcium-sensitive phosphatases such as PP2A (PMID: 23752926) or calcineurin (PMID: 10217279) are not considered candidates for dephosphorylation of DARPP32 as these are described previously (CDK5) and conditions of increased calcium as seen here would favour these enzymes. The phospho-T75 data are supportive, but such additional discussion could be important given the past demonstrations.

      We thank the reviewers for the great insight. The pathway that promotes the T75 phosphorylation/dephosphorylation indeed includes many players as calcineurin and PPA2A. We mention this in the discussion now as follows:

      However, phosphatases such as PP2A and calcineurin, which de-phosphorylate DARPP32 including the Cdk5 phosphorylation site, may be involved in this process as well (Girault and Nairn, 2021). Upon light treatment and increase of Ca2+ these phosphatases would dephosphorylate DARPP32 and thereby inactivate it, leading to PKA activation. This process may occur in parallel to the Cdk5 regulation of DARPP32 contributing to a sustained activation of the light signaling pathway via PKA activation.

      (8) additional details on the knock-downs would be helpful: 

      - the relative amount of reduction in gene expression upon shRNA treatment should be provided  - How was the exact viral delivery and reduction in shRNA-induced knock-down confirmed for the individual animals?  

      The validation of Cdk5 knockdown was widely performed in the previous paper (Brenna et al., 2019, Fig2-Fig supp1, and Fig3-Fig suppl2). We used the same mice. We confirmed the goodness of the silencing also in the supp figure 1A of the current paper.

      (9) The authors only focus on male mice. This is rather incomplete, as it leaves away an important half of biological reality. Testing relevant aspects of the work in female mice would close this significant gap and also increase the number of biological replicates, which can still be considered relatively low. 

      We thank the reviewers for the suggestion. We injected female mice and performed the Ashoff type-II light pulse experiment at ZT14 and observe the same phenotype as for male mice. This is stated now in the paper and the data are shown in supplemental figure 1 e-f.

      (10) Given the roles of CdK5 in circadian clock period length regulation, but also light-induced phase delays, it would be interesting for a broader audience to discuss possible expectations of CdK5's roles, e.g. 

      (a) How will other circadian parameters, eg. activity bouts (numbers, length, activity onset/ offset) be affected? 

      (b) How does that relate to sleep, sleep phases? 

      (c) What is the expected impact on other physiological rhythms, eg food intake, cortisol levels? 

      (d) What are the expected effects on circadian oscillation of gene expression in other brain regions, organs? 

      We thank the reviewers for the observations. 

      a) The activity was discussed in the previous paper (Brenna et al. 2019). ShCdk5 mice show a reduced activity in both DD and LD 12:12 compared to wt, mirroring the Per2 brdm phenotype (Figure- Suppl3, with the difference mostly observed at night time (Figure 2-suppl4).

      We also demonstrate in Suppl Fig1 b, c of the current paper that light pulse does not affect the period length either in scramble mice or in sh Cdk5.

      b) We performed preliminary experiments with SCN shCdk5 knock-down animals and compared them to scr control mice using the Piezo sleep system. Total sleep was not different, however during the dark phase shCdk5 animals tended to sleep a bit more, similar to the neuronal Per2 KO animals (Wendrich et al., 2023 https://doi.org/10.3390/clockssleep5020017 ). After sleep-deprivation no differences were observed between shCdk5 and scr animals. This was comparable to the neuronal Per2 KO animals that also showed no phenotype after sleep deprivation.

      c) and d) We did not investigate food intake, cortisol, or other parameters involving peripheral clocks. We did not investigate the gene expression in other brain regions because the SCN is the main brain region involved in the regulation of the circadian clock phase shift. However future studies will address these questions.

    1. Author response:

      We appreciate the reviewers' thoughtful and constructive comments. In this provisional response, we aim to address what we see as the key critiques, with a detailed, point-by-point reply to be provided alongside the revised manuscript. Below, we outline how we intend to address these critiques in the revised manuscript.

      (1) We will revise sections of the manuscript to ensure that all results, particularly those concerning the effects of lesions, are described more clearly and with sufficient context. This includes providing additional visualizations and rewording any ambiguous statements.

      (2) In this study, we examined a subset of 7,396 blocks where animals quickly adapted after block switches (achieving LCriterion in 20 or fewer trials), thereby focusing on expert-level performance and avoiding periods that might be affected by low motivation. It is valid to question whether the same observations would hold if the full dataset were analyzed. To address this, we expanded our analysis to include a supplementary figure Supplementary Figure 1.1 that illustrate the same relationships based on block length (BL) instead of LRandom, both with and without the restriction on LCriterion (n = 9,156 blocks in which the block length is under 100 trials, without any LCriterion restrictions), and based on LRandom without any LCriterion restrictions and with a less stringent LCriterion restriction (with ≤ 50 Trials for the criterion). This method allowed us to include all trials in our dataset. We observed similar effects of block length on choice behavior around switches (Figure 3), confirming the consistency of our findings across different analytical conditions.

      (3) We agree that robust validation of model selection is crucial. To address this, we will generate a confusion matrix to assess whether our model selection process accurately identifies the correct model class across a range of generative parameters. Include additional model selection metrics, such as cross-validation, to complement the BIC analysis and provide a more robust comparison of models.

      (4) We acknowledge the concern regarding our comparison of the "best" and the "4th best" models. The "4th best" model was chosen because it is the most widely recognized in the literature. Our intention was to demonstrate the performance of the most commonly used model, but we understand how this may have been misleading. To address this, we will revise our comparison to focus on the "best" and the "2nd best" models, ensuring greater clarity in the manuscript. Additionally, we will include supplementary simulation results and figures to provide a more comprehensive analysis on models.

    1. Author response:

      We appreciate the expression of enthusiasm for our paper by the editors and the three reviewers and the suggestions on how to improve the study. Here we outline how we will address the reviewers’ concerns and suggestions in a planned revision of our manuscript.

      Reviewer #1 listed two primary weaknesses:

      (1) the need for discussion of the extent to which the cell line we used resembles CRH neurons and

      (2) that we did not test for the effect of blockade of the glucocorticoid receptor.

      (1) As the reviewer acknowledges, our experiments called for the use of a cell line to dissect intracellular trafficking of the α1 adrenoreceptor. We selected the N42 cell line for this purpose because it is an immortalized hypothalamic cell line (developed by Belsham and colleagues, Belsham et al., 2004) that expresses CRH. We have used this cell line successfully in the past to study transcriptional and rapid non-genomic actions of glucocorticoids, which indicated that, in addition to expressing CRH, these cells also express both the nuclear glucocorticoid receptor and a membrane-associated receptor that binds glucocorticoids (Rainville et al., 2019; Weiss et al., 2019). We believe that this hypothalamic cell line is the most closely related to native PVN CRH neurons of any cell line available. As requested, we will add to the Discussion of the manuscript to further justify our choice of cells.

      (2) We agree that this experiment should be performed. We will test the classical GR (and progesterone) antagonist RU486 (mifepristone) for its effect on the cort regulation of α1 adrenoreceptor trafficking. Our ex vivo electrophysiology studies have indicated that the rapid glucocorticoid effect in native hypothalamic CRH neurons is not blocked by RU486 and is not, therefore, dependent on activation of the classical nuclear GR (Di et al., 2003; Di et al., 2016).

      Reviewer #2 also listed two main weaknesses of the study:

      (1) that we did not test whether the adrenoreceptor desensitization by restraint stress generalizes to other stress modalities and might be more robust with a pure somatic stressor, and

      (2) the lack of identification of a target protein as a mechanism for the role of nitrosylation.

      (1) We used restraint stress as a means to elicit corticosterone release, which desensitized the HPA response to a NE-dependent somatic stressor (lipopolysaccharide injection) but not to a NEindependent psychological stressor (predator odor) (Jiang et al., 2021). We got a near-complete loss of the sensitivity of CRH neurons to NE with restraint (i.e., near ceiling effect), such that a different stressor, including a more purely somatic stressor, should not increase the Cort-induced desensitization further. For that reason, we would argue that testing other stressors would not add value to the current study. That said, we plan and have received new funding to test in the future whether the Cort desensitization of the HPA response to LPS stress generalizes to other somatic stressors. We also have future plans to test for the Cort desensitization of other Gq-coupled receptors.

      (2) We agree that finding the molecular target of nitrosylation as the mechanism for Cort desensitization of α1 adrenoreceptors would significant improve the study, but this is a potentially enormous undertaking as it will require the screening and validation of multiple proteins involved in protein trafficking to find the one(s) targeted for nitrosylation by Cort. We tested β-arrestin as a possible target in the paper, but did not find Cort to regulate β-arrestin nitrosylation. We plan to undertake a general nitrosylation screen of proteins to identify multiple possible targets, but prefer to defer this and the validation of possible targets to a future, more thorough analysis.

      Reviewer #3 also pointed out two main weaknesses of our study:

      (1) that the glucocorticoidnitrosylation link was confusing, and

      (2) that it was unclear how blocking α1 adrenoreceptors reversed the Cort-induced cytosolic accumulation of the receptor.

      We appreciate the reviewer pointing out these deficiencies in our interpretation and explanation of our findings. We plan to address them directly in the revised version of the paper. 

      References

      Belsham DD, Cai F, Cui H, Smukler SR, Salapatek AMF, Shkreta L (2004) Generation of a phenotypic array of hypothalamic neuronal cell models to study complex neuroendocrine disorders. Endocrinology 145:393–400.

      Weiss GL, Rainville JR, Zhao Q, Tasker JG (2019) Purity and stability of the membrane-limited glucocorticoid receptor agonist dexamethasone-BSA. Steroids 142:2-5. 

      Rainville JR, Weiss GL, Evanson N, Herman JP, Vasudevan N, Tasker JG (2019) Membrane-initiated nuclear trafficking of the glucocorticoid receptor in hypothalamic neurons. Steroids 142:55-64.

      Di S, Malcher-Lopes R, Halmos KCs, Tasker JG (2003) Non-genomic glucocorticoid inhibition via endocannabinoid release in the hypothalamus: a fast feedback mechanism. Journal of Neuroscience 23:4850-4857.

      Di S, Itoga CA, Fisher MO, Solomonow J, Roltsch EA, Gilpin NW, Tasker JG (2016) Acute stress suppresses inhibition and increases anxiety via endocannabinoid release in the basolateral amygdala. Journal of Neuroscience 36:8461-8470.

      Jiang Z, Chen C, Weiss GL, Fu X, Stelly CE, Sweeten BLW, Tirrell PS, Pursell I, Stevens CR, Fisher MO, Begley JC, Harrison LM, Tasker JG (2022) Stress-induced glucocorticoid desensitizes adrenoreceptors to gate the neuroendocrine response to somatic stress in male mice. Cell Reports 41(3):111509.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials according to the experts' consensus-recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on the rigor, the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, there are certain aspects that are not clearly described in the Materials & Methods section, such as a description of the statistical analyses used for hypothesis testing.

      Thank you for pointing this out. A description of the statistical models used in the analyses of EEG biomarkers has been added to the Materials and Methods:

      “First, exponent and offset values were averaged across all electrodes and analyzed using a 2x2 repeated measures ANOVA with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task) as a within-subjects factor. Age was included in the analyses as a covariate due to the correlation between variables. Next, exponent and offset values were averaged across electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), and to the left (T7, TP7, TP9) and right superior temporal sulcus (T8, TP8, TP10). The electrodes were selected based on the analyses outlined by Giacometti and colleagues (2014) and Scrivener and Reader (2022). For these analyses, a 2x2x2x2 repeated measures ANOVA with age as a covariate was conducted with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task), hemisphere (left, right), and region (frontal, temporal) as within-subjects factors. Results for the alpha and beta bands were calculated for the same clusters of frontal and temporal electrodes and analyzed with a similar 2x2x2x2 repeated measures ANOVA; however, for these analyses, age was not included as a covariate due to a lack of significant correlations.”

      We also expanded the description of the statistical models used in the analyses of MRS biomarkers:

      “To analyze the metabolite results, separate univariate ANCOVAs were conducted for Glu, GABA+, Glu/GABA+ ratio and Glu/GABA+ imbalance measures with group (control, dyslexic) as a between-subjects factor and voxel gray matter volume (GMV) as a covariate. Additionally, for the Glu analysis, age was included as a covariate due to a correlation between variables. Both frequentist and Bayesian statistics were calculated. Glu/GABA+ imbalance measure was calculated as the square root of the absolute residual value of a linear relationship between Glu and GABA+ (McKeon et al., 2024).”

      With regard to metabolite quantification, it is unclear why the authors chose to analyze and report metabolite values in terms of creatine ratios rather than quantification based on a water reference given that the MRS acquisition appears to support using a water reference.

      We have decided to use the ratio of Glu and GABA to total creatine (tCr), as this is still a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This approach normalizes the signal, reducing the impact of intensity variations across different regions and tissue compositions. Additionally, total creatine concentration is considered relatively stable across different brain regions, which is particularly important in our study, where a functional localizer was used to establish the left STS region individually. Our decision was further influenced by previous studies on dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) which have reported creatine ratios and included GM volume as a covariate in their models, thus providing comparability. It is now indicated in the Results:

      “For comparability with previous studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) we report Glu and GABA as a ratio to total creatine (tCr).”

      and in the Method sections:

      “Glu and GABA+ concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine) following previous MRS studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014).

      We did not estimate absolute concentrations using water signals as a reference, as this would require accounting for water relaxation times, which may vary across our age range. Nevertheless, our dataset has been made publicly available for future researchers to calculate and compare absolute values.

      Del Tufo, S. N., Frost, S. J., Hoeft, F., Cutting, L. E., Molfese, P. J., Mason, G. F., Rothman, D. L., Fulbright, R. K., & Pugh, K. R. (2018). Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration. Frontiers in Psychology, 9, 1507. https://doi.org/10.3389/fpsyg.2018.01507

      Nandi, T., Puonti, O., Clarke, W. T., Nettekoven, C., Barron, H. C., Kolasinski, J., Hanayik, T., Hinson, E. L., Berrington, A., Bachtiar, V., Johnstone, A., Winkler, A. M., Thielscher, A., Johansen-Berg, H., & Stagg, C. J. (2022). tDCS induced GABA change is associated with the simulated electric field in M1, an effect mediated by grey matter volume in the MRS voxel. Brain Stimulation, 15(5), 1153–1162. https://doi.org/10.1016/j.brs.2022.07.049

      Pugh, K. R., Frost, S. J., Rothman, D. L., Hoeft, F., Del Tufo, S. N., Mason, G. F., Molfese, P. J., Mencl, W. E., Grigorenko, E. L., Landi, N., Preston, J. L., Jacobsen, L., Seidenberg, M. S., & Fulbright, R. K. (2014). Glutamate and choline levels predict individual differences in reading ability in emergent readers. Journal of Neuroscience, 34(11), 4082–4089. https://doi.org/10.1523/JNEUROSCI.3907-13.2014

      Smith, G. S., Oeltzschner, G., Gould, N. F., Leoutsakos, J. S., Nassery, N., Joo, J. H., Kraut, M. A., Edden, R. A. E., Barker, P. B., Wijtenburg, S. A., Rowland, L. M., & Workman, C. I. (2021). Neurotransmitters and Neurometabolites in Late-Life Depression: A Preliminary Magnetic Resonance Spectroscopy Study at 7T. Journal of Affective Disorders, 279, 417–425. https://doi.org/10.1016/j.jad.2020.10.011

      GABA is typically quantified using J-editing sequences as lower field strengths (~3T), and there is some evidence that the GABA signal can be reliably measured at 7T without editing, however, the authors should discuss potential limitations, such as reliability of Glu and GABA measurements with short-TE semi-laser at 7T.

      In addition, MRS measurements of GABA are known to be influenced by macromolecules, and GABA is often denoted as GABA+ to indicate that other compounds contribute to the measured signal, especially at a short TE and in the absence of symmetric spectral editing.

      A general discussion of the strengths and limitations of unedited Glu and GABA quantification at 7T is warranted given the interest of this work to researchers who may not be experts in MRS.

      While we agree with the Reviewer that at 3T, it is recommended to use J-edited MRS to measure GABA (Mullins et al., 2014), the better spectral resolution at 7T allows for more reliable results for both metabolites using moderate echo-time, non-edited MRS (Finkelman et al., 2022). In this study, we used a short echo time (TE), which is optimal for Glu but not ideal for GABA, as it interferes with other signals. We are grateful to the Reviewer for suggesting the addition of a short paragraph to the Discussion, describing the practicalities of 3T and 7T MRS and changing the abbreviation to GABA+ to inform readers of possible macromolecule contamination:

      “We chose ultra-high-field MRS to improve data quality (Özütemiz et al., 2023), as the increased sensitivity and spectral resolution at 7T allows for better separation of overlapping metabolites compared to lower field strengths. Additionally, 7T provides a higher signal-to-noise ratio (SNR), improving the reliability of metabolite measurements and enabling the detection of small changes in Glu and GABA concentrations. Despite these theoretical advantages, several practical obstacles should be considered, such as susceptibility artifacts and inhomogeneities at higher field strengths that can impact data quality. Interestingly, actual methodological comparisons (Pradhan et al., 2015; Terpstra et al., 2016) show only a slight practical advantage of 7T single-voxel MRS compared to optimized 3T acquisition. For example, fitting quality yielded reduced estimates of variance in concentration of Glu in 7T (CRLB) and slightly improved reproducibility levels for Glu and GABA (at both fields below 5%). Choosing the appropriate MRS sequence involves a trade-off between the accuracy of Glu and GABA measurements, as different sequences are recommended for each metabolite. J-edited MRS is recommended for measuring GABA, particularly with 3T scanners (Mullins et al., 2014). However, at 7T, more reliable results can be obtained using moderate echo-time, non-edited MRS (Finkelman et al., 2022). We have opted for a short-echo-time sequence, which is optimal for measuring Glu. However, this approach results in macromolecule contamination of the GABA signal (referred to as GABA+).”

      Finkelman, T., Furman-Haran, E., Paz, R., & Tal, A. (2022). Quantifying the excitatory-inhibitory balance: A comparison of SemiLASER and MEGA-SemiLASER for simultaneously measuring GABA and glutamate at 7T. NeuroImage, 247, 118810. https://doi.org/10.1016/j.neuroimage.2021.118810

      Mullins, P. G., McGonigle, D. J., O'Gorman, R. L., Puts, N. A., Vidyasagar, R., Evans, C. J., Cardiff Symposium on MRS of GABA, & Edden, R. A. (2014). Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage, 86, 43–52. https://doi.org/10.1016/j.neuroimage.2012.12.004

      Özütemiz, C., White, M., Elvendahl, W., Eryaman, Y., Marjańska, M., Metzger, G. J., Patriat, R., Kulesa, J., Harel, N., Watanabe, Y., Grant, A., Genovese, G., & Cayci, Z. (2023). Use of a Commercial 7-T MRI Scanner for Clinical Brain Imaging: Indications, Protocols, Challenges, and Solutions-A Single-Center Experience. AJR. American Journal of Roentgenology, 221(6), 788–804. https://doi.org/10.2214/AJR.23.29342

      Pradhan, S., Bonekamp, S., Gillen, J. S., Rowland, L. M., Wijtenburg, S. A., Edden, R. A., & Barker, P. B. (2015). Comparison of single voxel brain MRS AT 3T and 7T using 32-channel head coils. Magnetic Resonance Imaging, 33(8), 1013–1018. https://doi.org/10.1016/j.mri.2015.06.003

      Terpstra, M., Cheong, I., Lyu, T., Deelchand, D. K., Emir, U. E., Bednařík, P., Eberly, L. E., & Öz, G. (2016). Test-retest reproducibility of neurochemical profiles with short-echo, single-voxel MR spectroscopy at 3T and 7T. Magnetic Resonance in Medicine, 76(4), 1083–1091. https://doi.org/10.1002/mrm.26022

      Further, the single MRS voxel location is a limitation of the study as neurochemistry can vary regionally within individuals, and the putative excitatory/inhibitory imbalance in dyslexia may appear in regions outside the left temporal cortex (e.g., network-wide or in frontal regions involved in top-down executive processes). While the functional localization of the MRS voxel is a novelty and a potential advantage, it is unclear whether voxel placement based on left-lateralized reading-related neural activity may bias the experiment to be more sensitive to small, activity-related fluctuations in neurotransmitters in the CON group vs. the DYS group who may have developed an altered, compensatory reading strategy.

      We agree that including only one region of interest for the MRS measurements is a potential limitation of our study, and we have now added this information to the Discussion:

      “Moreover, since the MRS data was collected only from the left STS, it is plausible that other areas might be associated with differences in Glu or GABA concentrations in dyslexia.”

      However, differences in Glu and GABA concentrations in this region were directly predicted by the neural noise hypothesis of dyslexia. We acknowledge that this information was missing in the previous version of the manuscript. It is now included in the Results:

      “Moreover, the neural noise hypothesis of dyslexia identifies perisylvian areas as being affected by increased glutamatergic signaling, and directly predicts associations between Glu and GABA levels in the superior temporal regions and phonological skills (Hancock et al., 2017).”

      as well as in the Discussion:

      “Nevertheless, the neural noise hypothesis predicted increased glutamatergic signaling in perisylvian regions, specifically in the left superior temporal cortex (Hancock et al., 2017).”

      Figure 1 contains a lot of information, and it may be helpful to split it into 2 figures (EEG vs. MRS) so that the plots could be made larger and the reader could more easily digest the information.

      (a) I would also recommend displaying separate metabolite fit plots for each group, since the current presentation in panel F makes it appear that the MRS data is examined by testing differences between groups across the full spectrum (where the lines diverge), which really isn't the case.

      (b) The GABA peak is not visible in the spectrum, and Glutamate and GABA both have multiple peaks that should be shown on the spectrum. This may be best achieved by displaying the individual metabolite sub-spectra below the full spectrum

      Thank you for these suggestions. We have split the information into two Figures following the Reviewer’s recommendations.

      It is not clear why the 3T structural images were used for segmentation and calculation of tissue fraction if 7T structural images were also acquired (which would presumably have higher resolution).

      Generally, T1-weighted images from the 7T scanner exhibit more artifacts than those from the 3T scanner due to higher magnetic field inhomogeneity. These artifacts are especially pronounced in regions near air-tissue interfaces, such as the temporal lobes. Therefore, we chose the 3T structural images for segmentation and tissue fraction calculations and clarified this in the Method section:

      “Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12, as the latter exhibited excessive artifacts and intensity bias in the temporal regions”.

      The basis set includes a large number of metabolites (27), including many low-concentration metabolites/compounds (e.g., bHG, bHB, Citrate, Threonine, ethanol) that are typically only included in studies targeting specific metabolites in disease/pathology. Please justify the inclusion of this maximal set of metabolites in the basis set, given that the inclusion of overlapping low-concentration metabolites may influence metabolite measurements of interest (https://doi.org/10.1002/mrm.10246).

      There is still no consensus in the MR community on which metabolites should be included in the model of human cerebral 1H-MR spectra. Typically, only major contributors such as NAA, Cr, Cho, Lac, mI, and possibly Glx are evaluated. Some studies also include additional metabolites like Ace, Ala, Asp, GABA, Glc, Gly, sI, NAAG, and Tau. In this study, as in a few others, further metabolites such as PCh, GPC, PCr, GSH, PE, and Thr were introduced and this approach seems suitable for high-field spectra (Hofmann et al., 2002).

      Hofmann, L., Slotboom, J., Jung, B., Maloca, P., Boesch, C., & Kreis, R. (2002). Quantitative 1H-magnetic resonance spectroscopy of human brain: Influence of composition and parameterization of the basis set in linear combination model-fitting. Magnetic Resonance in Medicine, 48(3), 440–453. https://doi.org/10.1002/mrm.10246

      Please provide a figure indicating the localization of the MRS voxel for a sample subject.

      A figure indicating the localization of the MRS voxel for a sample subject was added to the MRS checklist.

      It would be helpful to include Table S1 in the main article.

      Table S1 from the Supplementary Material has now been added to the main manuscript as Table 1 in the Results section.

      Please report descriptive statistics for EEG and MRS measures in Table S1.

      We have added a new Table S1 in the Supplementary Material, providing descriptive statistics for EEG and MRS E/I balance measures, presented separately for the dyslexic and control groups.

      I recommend avoiding using the terms "direct" and "indirect" to contrast MRS and EEG measures of E/I balance. Both of these measures are imperfect and it is misleading to say that MRS is a "direct" measure of neurotransmitters. There is also ambiguity in what is meant by "direct": in contrast to EEG, MRS does not measure neural activity and does not provide high-resolution temporal information, so in a sense, it is less direct.

      Thank you for this suggestion. We have replaced the terms 'direct' and 'indirect' biomarkers with 'MRS' and 'EEG' biomarkers throughout the text.

      There are many cases throughout the results in which Bayes and frequentist stats seem to contradict each other in terms of significance and what should be included in the models, especially with regard to the interaction effects (the Bayes factors appear to favor non-significant interactions). I think this is worth considering and describing to offer more clarity for the readers.

      We agree that a discussion of the divergent results between Bayesian and frequentist models was missing in the previous version of the manuscript. To provide greater clarity for the readers, we have conducted follow-up Bayesian t-tests in every case where the results indicated the inclusion of non-significant interactions with the effect of group in the model. These additional analyses have been performed for the exponent, offset, as well as for beta bandwidth in the Supplementary Material. We have also added a paragraph addressing these discrepancies in the Discussion:

      “Remarkably, in some models, results from Bayesian and frequentist statistics yielded divergent conclusions regarding the inclusion of non-significant effects. This was observed in more complex ANOVA models, whereas no such discrepancies appeared in t-tests or correlations. Given reports of high variability in Bayesian ANOVA estimates across repeated runs of the same analysis (Pfister, 2021), these results should be interpreted with caution. Therefore, following the recommendation to simplify complex models into Bayesian t-tests for more reliable estimates (Pfister, 2021), we conducted follow-up Bayesian t-tests in every case that favored the inclusion of non-significant interactions with the group factor. These analyses provided further evidence for the lack of differences between the dyslexic and control groups. Another source of discrepancy between the two methods may stem from the inclusion of interactions between covariates and within-subject effects in frequentist ANOVA, which were not included in Bayesian ANOVA to adhere to the recommendation for simpler Bayesian models (Pfister, 2021).”

      Pfister, R. (2021). Variability of Bayes factor estimates in Bayesian analysis of variance. The Quantitative Methods for Psychology, 17(1), 40-45. doi:10.20982/tqmp.17.1.p040

      It would be helpful to indicate whether participants in the DYS group had a history of reading intervention/remediation. In addition to showing that the DYS group performed lower than the CON group on reading assessments as a whole and given their age, was the performance on the reading assessments at an individual level considered for inclusion in the study? (i.e., were participants' persistent poor reading abilities confirmed with the research assessments?)

      We were unable to assess individual reading skills due to the lack of standardized diagnostic norms for adult dyslexia in Poland. Therefore, participants in the dyslexic group were recruited based on a previous clinical diagnosis of dyslexia, and reading and reading-related tasks were used for group-level comparisons only. This information has been added to the Methods section:

      “Since there are no standardized diagnostic norms for dyslexia in adults in Poland, individuals were assigned to the dyslexic group based on a past diagnosis of dyslexia.”

      Unfortunately, we did not collect information about participants' history of reading intervention or remediation. In this context, we acknowledge that including a sample of adult participants is a potential limitation of our study, however, this was already mentioned in the Discussion.

      Regarding the fMRI task, please indicate whether the participants whose threshold and/or contrast was changed for localization were from the DYS or CON group.

      This information is now added to the Method section:

      “For 6 participants (DYS n = 2, CON n = 4), the threshold was lowered to p < .05 uncorrected, while for another 6 participants (DYS n = 3, CON n = 3) the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.”

      Reviewer #2 (Public Review):

      Summary:

      This study utilized two complementary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well-conceived study with in-depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading-specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      We agree with the Reviewer that including different tasks for the EEG biomarkers assessment would be valuable. However, this limitation was already addressed in the Discussion:

      “Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis.”

      Further, this work does not consider prior studies reporting neural inconsistency; a potential consequence of increased neural noise, which has been reported in several studies and linked with candidate-dyslexia gene variants (e.g., Centanni et al., 2018, 2022; Hornickel & Kraus, 2013; Neef et al., 2017). While E/I imbalance may not be a cause of increased neural noise, other potential mechanisms remain and should be discussed.

      Thank you for referring us to other works reporting neural variability in dyslexia. We agree that a broader context regarding sources of reduced neural synchronization, beyond E/I imbalance, was missing in the previous version of the manuscript. We have now included these references in the Discussion:

      “Furthermore, although our results do not support the idea of E/I balance alterations as a source of neural noise in dyslexia, they do not preclude other mechanisms leading to less synchronous neural firing posited by the hypothesis. In this context, there is evidence showing increased trial-to-trial inconsistency of neural responses in individuals with dyslexia (Centanni et al., 2022) or poor readers (Hornickel and Kraus, 2013) and its associations with specific dyslexia risk genes (Centanni et al., 2018; Neef et al., 2017). At the same time, the observed trial-to-trial inconsistency was either present only in a subset of participants (Centanni et al., 2018), limited to some experimental conditions (Centanni et al., 2022), or specific brain regions – e.g., brainstem in Hornickel and Kraus (2013), left auditory cortex in Centanni et al. (2018), or left supramarginal gyrus in Centanni et al. (2022).”

      A better description of the exponent and offset components is needed at the beginning of the results, given that the methods are presented in detail at the end. I also do not see a clear description of these components in the methods.

      A description of the aperiodic components is now included in the Results:

      “In the initial step of the analysis, we analyzed the aperiodic (exponent and offset) components of the EEG spectrum. The exponent reflects the steepness of the EEG power spectrum, with a higher exponent indicating a steeper signal; while the offset represents a uniform shift in power across frequencies, with a higher offset indicating greater power across the entire EEG spectrum (Donoghue et al., 2020).”

      as well as in the Materials and Methods:

      “Two broadband aperiodic parameters were extracted: the exponent, which quantifies the steepness of the EEG power spectrum, and the offset, which indicates signal’s power across the entire frequency spectrum.”

      Reviewer #3 (Public Review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluate the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show solid 'no evidence' of EI balance differences between groups, challenging the neural noise hypothesis. The work will be of broad interest to neuroscientists, and educational and clinical psychologists.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluate the neural noise hypothesis in Dyslexia.

      Weaknesses:

      The authors may need to provide more data to assess the quality of the MRS data.

      We have addressed the following specific recommendations of the Reviewer providing more data about the quality of the MRS data.

      The authors may need to explain how the number of subjects is determined in the MRS section.

      We have clarified the MRS sample description in the Results section:

      “Due to financial and logistical constraints, 59 out of the 120 recruited subjects, selected progressively as the study unfolded, were examined with MRS. Subjects were matched by age and sex between the dyslexic and control groups. Due to technical issues and to prevent delays and discomfort for the participants, we collected 54 complete sessions. Additionally, four datasets were excluded based on our quality control criteria, and three GABA+ estimates exceeded the selected CRLB threshold. Ultimately, we report 50 estimates for Glu (21 participants with dyslexia) and 47 for GABA+ and Glu/GABA+ ratios (20 participants with dyslexia).”

      Is there a reason why theta and gamma peaks were not observed in the majority of participants? What are the possible reasons that likely caused the discrepancy between this study and previously reported relevant studies?

      We have now added a discussion about the absence of oscillatory peaks in the theta and gamma bands to the Discussion section:

      “We could not perform analyses for the gamma oscillations since in the majority of participants the gamma peak was not detected above the aperiodic component. Due to the 1/f properties of the EEG spectrum, both aperiodic and periodic components should be disentangled to analyze ‘true’ gamma oscillations; however, this approach is not typically recognized in electrophysiology research (Hudson and Jones, 2022). Indeed, previous studies that analyzed gamma activity in dyslexia (Babiloni et al., 2012; Lasnick et al., 2023; Rufener and Zaehle, 2021) did not separate the background aperiodic activity. For the same reason, we could not analyze results for the theta band, which often does not meet the criteria for an oscillatory component manifested as a peak in the power spectrum (Klimesch, 1999). Moreover, results from a study investigating developmental changes in both periodic and aperiodic components suggest that theta oscillations in older participants are mostly observed in frontal midline electrodes (Cellier et al., 2021), which were not analyzed in the current study.”

      Hudson, M. R., & Jones, N. C. (2022). Deciphering the code: Identifying true gamma neural oscillations. Experimental Neurology357, 114205. https://doi.org/10.1016/j.expneurol.2022.114205

      Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews29(2-3), 169-195. https://doi.org/10.1016/S0165-0173(98)00056-3

      Based on Figure 1F, the quality of the MRS data may be contaminated by the lipid signal, especially for the DYS group. To better evaluate the MRS data, especially the GABA measurements, the authors need to show:

      (a) the placement of the MRS voxel on the anatomical images;

      Averaged MRS voxel placement was already presented in Figure 1 (now Figure 2) in the manuscript. Now, we have also added exemplary single-subject images to the MRS checklist in the Supplement.

      (b) Glu and GABA model functions

      We have now provided more meaningful Glu and GABA indications in Figure 2.

      (c) CRLB for GABA

      We have added respective estimates to the Supplement:

      %CRLB of Glu: mean 2.96, SD = 0.79

      %CRLB of GABA: mean 10.59, SD = 2.76

      %CRLB of NAA: 1.76 SD = 0.46

      Further, the authors added voxel's gray matter volume as a covariate when performing separate ANCOVAs. The authors may need to use alpha correction or 1-fCSF correction to corroborate these results.

      We chose to use the ratio of Glu and GABA to total creatine (tCr), as this remains a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This decision was also influenced by previous dyslexia studies (Del Tufo et al., 2018; Pugh et al., 2014) and is now clarified in the Results and Methods sections.

      Regarding alpha correction, a recent paper (García-Pérez et al., 2023) recommends: 'In general, avoid corrections for multiple testing if statistical claims are to be made for each individual test, in the absence of an omnibus null hypothesis.' Since we report null findings, further alpha correction would not significantly impact the results.

      García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology8, 100120. https://doi.org/10.1016/j.metip.2023.100120

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      This publication applies 3D super-resolution STORM imaging to understanding the role of developmental neural activity in the clustering of retinal inputs to the mouse dorsal lateral geniculate nucleus (dLGN). The authors argue that retinal ganglion cell (RGC) synaptic boutons start forming clusters early in postnatal development (P2). They then argue that these clusters contribute to eye-specific segregation of retinal inputs by activity-dependent stabilization of nearby boutons from the same eye. The data provided is N=3 animals for each condition of P2, P4, and P8 animals in wild-type mice and in mice where early patterns of structured retinal activity are blocked.

      Strengths:

      The 3D storm imaging of pre and postsynaptic elements provides convincing high-resolution localization of synapses.

      The experimental design of comparing ipsilateral and contralateral RGC axon boutons in a region of the dLGN that is known to become contralateral is elegant. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling.

      Weaknesses:

      Based on previous literature, it is known that synapse density, synapse clustering, and synaptic specificity increase during postnatal development. Previous work has also shown that both the changes in synaptic clustering and synaptic specificity are affected by retinal activity. The data and analysis provided by the authors add little unambiguous evidence that advances this understanding.

      We agree with the reviewer that previous literature shows that synapse density, synapse clustering, and synaptic specificity increase during postnatal development and that these processes are affected by retinal activity. The majority of studies on synaptic refinement have been performed after eye-opening, when eye-specific segregation is already complete. In contrast, most studies of eye-specific segregation focus on axonal refinement phenotypes. To our knowledge, only a small number of experiments have examined retinogeniculate synaptic properties at the nanoscale during eye-specific segregation (1-4). Our broad goal is to understand the mechanisms of synaptogenesis and competition at the earliest stages of eye-specific refinement, when spontaneous retinal activity is a major driver of activity-dependent remodeling. We hope that readers will appreciate that there is still much to discover in this fascinating model system of synaptic competition.

      General problem 1: Most of the statistical analysis is limited to ANOVA comparison of axons from the contralateral and ipsilateral retina in the contralateral dLGN. The hypothesis that ipsilateral and contralateral axons would be statistically identical in the contralateral dLGN is not a plausible hypothesis so rejecting the hypothesis with P < X does not advance the authors' arguments beyond what was already known.

      General problem 2: Most of the interpretation of data is qualitative. While error bars are provided, these error bars are not used to draw conclusions. Given the small sample size (N=3), there is a large degree of uncertainty regarding the magnitude of changes (synapse size, number, specificity). The authors base their conclusions on the averages of these values when the likely degree of uncertainty could allow for the opposite interpretation.

      We appreciate the reviewer’s concerns regarding the use of ANOVA for statistical testing in the original submission. We have generated new figures that show confidence intervals for each analysis in the manuscript and these are included in the response to reviewers document below. To address the underlying concern that our N=3 sample size limits the interpretation of our results, we have revised the manuscript to be cautious in our interpretations and to discuss additional possibilities that are consistent with the anatomical data.

      General problem 3: Two of the four results sections depend on using the frequency of single active zone vGlut2 clusters near multiple active zone vGlut2 as a proxy for synaptic stabilization of the single active zone vGlut2 clusters by the multiple active zone vGlut2 clusters. The authors argue that the increased frequency of same-eye single active zone clusters relative to opposite-eye single active zone clusters means that multiple active zone vGlut2 clusters are selectively stabilizing single active zone clusters. There are other plausible explanations for this observation that are not eliminated. An increased frequency of nearby single active zone clusters would also occur if RGC axons form more than one synapse in the dLGN. Eye-specific segregation is, by definition, a relative increase in the frequency of nearby boutons from the same eye. The authors were, therefore, guaranteed to observe a non-random relationship between boutons from the same eye. The authors do compare their measures to a random model, but I could not find a description of the model. I would expect that the model would need to account for RGC arbor size, arbor structure, bouton number, and segregation independent of multi-active-zone vGlut2 clusters. The most common randomization for the type of analysis described here, a shift in the positions of single-active zone boutons, would not be adequate.<br /> In discussing the claimed cluster-induced stabilization of nearby boutons, the authors state that the specificity increases with age due to activity-dependent refinement. Their quantification does not support an increase in specificity with age. In fact, the high degree of clustering "specificity" they observe at P2 argues for the trivial same axon explanation.

      We agree with the reviewer that individual RGC axons form multiple synapses and that, over time, eye-specific segregation must increase the frequency of like-eye synapses relative to opposite-eye synapses. Indeed, our previous study of eye-specific refinement showed that at P8, the density of eye-specific inputs had increased for the dominant-eye and decreased for the non-dominant-eye (1). However, at postnatal day 4, contralateral and ipsilateral input densities were the same in the future contralateral-eye territory. One of our goals in this study was to determine if the process of synaptic clustering begins at these earliest stages of synaptic competition and, if so, whether it is influenced by retinal wave activity. It is plausible that the RGC axons from the same eye could initially form synapses randomly and, at some later stage, synapses may be selectively added to produce mature glomeruli. Consistent with this possibility, previous analysis of JAM-B RGC axon refinement showed the progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5).

      Regarding the randomization that we employed, we performed a repositioning of synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. We agree that this type of randomization cannot account for the fine scale structure of axons and dendrites, which we did not have access to in this four-color volumetric super-resolution data set. To address this, we have performed additional clustering analyses surrounding both single-active zone and multi-active zone synapses. This new analysis showed that there is a modest clustering effect around single-active zone synapses compared to complete randomization described above. We now present this information using a normalized clustering index for direct comparison of clustering between multi-active zone and single-active zone synapses. We have measured effect sizes and confidence intervals, which we present in point-by-point responses below. We have restructured the manuscript figures and discussion to provide a balanced interpretation of our results and the limitations of our study.

      Analysis of specific claims:

      Result Section 1

      Most of the figures show mean, error bars, and asterisks, but not the three data points from which these statistics are derived. Large changes in variance from condition to condition suggest that displaying the data points would provide more useful information.

      We thank the reviewer for their suggestion. We have updated all figures to display the means of all biological replicates as individual data points.

      Claim 1: Contralateral density increases more than ipsilateral in the contralateral region over the course of development. This claim is supported by the qualitative comparison of means and error bars in Figure 2D. The argument could be made quantitative by providing a confidence interval for synapse density increase for dominant and non-dominant synapse density. A confidence interval could then be generated for the difference in this change between the two groups. Currently, the most striking effect is a big difference in variance between P4 and P8 for dominant eye complex synapses. Given that N=3, I assume there is one extreme outlier here.

      We appreciate the comment and believe the reviewer was referring to the data presented in the original Figure 1D, rather than Figure 2D.

      We agree with the reviewer that our comment on the change in synapse density across ages was not quantitatively supported by the figure as we did not perform a proper age-wise statistical comparison. We have removed this claim in the revised manuscript.

      We also appreciate the suggestions to clarify the presentation of our statistical analyses and to utilize confidence interval measurements wherever possible. We present Author response image 1 below, showing the density of multi-AZ synapses in the contralateral-eye territory over time (P2-P8), for both CTB(+) contralateral (black) and CTB(-) ipsilateral inputs (red) featuring 5/95% confidence intervals:

      Author response image 1.

      More broadly, the reviewer has raised the concern that the low number of biological replicates (N=3) presents challenges in the use of ANOVA for statistical testing. We agree with the concern and have revised the manuscript to be cautious in our statistical tests and resulting claims. We have chosen to use paired T-tests to compare measurements of eye-specific synapse properties because these measurements were always made within each individual biological replicate (paired measurements). Below, we discuss our logic for this change and the effects on the results we present in the revised manuscript.

      Considering the above image:

      (1) ANOVA: In our initial submission, we used an ANOVA test which showed P<0.05 for the CTB(+) P4 vs. P8 comparison above, leading to our statement about an age-dependent increase in multi-AZ density. However, the figure above shows that P8 data has higher variance. Thus, the homogeneity of variance assumption of ANOVA may lead to false positives in this comparison.

      (2) Confidence interval for N=3: We calculated confidence intervals for P4 and P8 data (5/95% CI shown above). Overlap between the two groups indicates the true mean values of the two groups could be identical. However, the P8 confidence intervals (as well as other confidence intervals across other comparisons in the manuscript) also include the value of 0. This indicates there actually might be no multi-active zone synapses in the mouse dLGN. The failure arises because the low number of biological replicates (N=3 data points) precludes a reliable confidence interval measurement. CI measurements require sufficient sample sizes to determine the true population variance.

      (3) Difficulty in achieving sufficient sample sizes for CI analysis in ultrastructural studies of the brain: volumetric STORM experiments are technically complex and make use of sample preparation and analysis methods that are similar to volumetric electron microscopy (physical ultrathin sectioning and computational 3D stack alignment). For these technical reasons, it is difficult to collect imaging data from >10 mice for each group of data (e.g. age and tissue location) in one single project. Because of the technical challenges, most ultrastructural studies published to date present results from single biological replicates. In our STORM dataset, we collected imaging data of N=3 biological replicates for each age and genotype. We agree that in the future the collection of additional replicates will be important for improving the reliability of statistical comparisons in super-resolution and electron-microscopy studies. Continued advances in the throughput of imaging/analysis should help to make this easier over time. 

      (4) The use of paired T-tests: In this study, we have eye-specific CTB(+) and CTB(-) synapse imaging data from the same STORM fields within single biological replicates. When there is only one measurement from each replicate (e.g. synapse density, ratio of total synapses), using paired tests to compare these groups increases statistical power and does not assume similar variance. However, this limits our analysis to comparisons within each age, and not between ages. Accordingly, we have revised our discussion of the results and interpretations throughout the manuscript. When there are thousands of measurements of synapses from each replicate (e.g. Figure 2A-B on synapse volumes), we use a mixed linear model to analyze the variance. In the revised figures we present the results using standard error of the mean and link measurements from within the same individual replicates to show the paired data structure. In cases where specific comparisons are made across ages, we present 5/95% confidence interval measurements.

      Claim 2: The fraction of multiple-active zone vGlut2 clusters increases with age. This claim is weakly supported by a qualitative reading of panel 1E. The error bars overlap so it is difficult to know what the range of possible increases could be. In the text, the authors report mean differences without confidence intervals (or any other statistics). The reported results should, therefore, be interpreted as a description of their three mice and not as evidence about mice in general.

      We appreciate the reviewer’s concern that statistical accuracy of our synapse density comparisons over age is limited by the small sample size as discussed above. We have removed all strong claims about age-dependent changes in the density of multi-active zone and single-active zone synapses. Instead, we focus our analyses on comparisons between CTB(+) and CTB(-) synapse measurements, which are paired within each biological replicate. To specifically address the reviewer’s concern about figure panel 1E, we present Author response image 2 with confidence intervals below.

      Author response image 2.

      Figure S1. Panel A makes the point that the study could not be done without STORM by comparing the STORM images to "Conventional" images. The images are over-saturated low-resolution images. A reasonable comparison would be to a high-quality quality confocal image acquired with a high NA objective (~1.4) and low laser power (PSF ~ 0.2 x 0.2 x 0.6 um) that was acquired over the same amount of time it takes to acquire a STORM volume.

      We agree with the reviewer that the presentation of low-resolution conventional images is not necessary. We have deleted the panel and modified the text accordingly.

      Result section 2.

      Claim 1: The ipsi/contra (in contra LGN) difference in VGluT2 cluster volume increases with development. While there are many p-values listed, the main point is not directly quantified. A reasonable way to quantify the relative increase in volume could be in the form: the non-dominant volumes were 75%-95%(?) of the dominant volume at P2 and 60%-80% (?) at P8. The difference in change was -5 to 15%(?).

      We thank the reviewer for their helpful suggestion to improve the clarity of the results presented in this analysis of eye-specific synapse volumes. In our original report, we found differences in eye-specific VGluT2 volume at each time point (P2/P4/P8) in control mice (1). The original measurements used the entire synapse population. Here, we aimed to determine whether eye-specific differences in VGluT2 volumes were present for both multi-AZ synapses and single-AZ synapses, and whether one population may have a greater contribution to the previous population measurement that we reported. We found that at P4 (a time when the overall eye-specific synapse density is equivalent for both eyes in the dLGN), WT multi-AZ synapses showed a greater difference (372%) in eye-specific VGluT2 volume compared with single-AZ synapses (135%). In β2KO mice multi-AZ synapses showed a greater difference (110%) in eye-specific VGluT2 volume compared with single-AZ synapses (41%). In our initial manuscript submission, we included statistical comparisons of eye-specific volume differences across ages, but we did not highlight these differences in our discussion of the results. For clarity, we have removed all statistical comparisons across ages in the revised manuscript. We have modified the text to focus on eye-specific VGluT2 volume differences at P4 described above. To specifically address the reviewer’s question, we provide the percentage differences between multi- and single-AZ eye-specific synapses for each age/genotype below:

      Author response table 1.

      Claim 2: Complex synapses (vGlut2 clusters with multiple active zones) represent clusters of simple synapses and not single large boutons with multiple active zones. The authors argue that because vGlut2 cluster volume scales roughly linearly with active zone number, the vGlut2 clusters are composed of multiple boutons each containing a single active zone. Their analysis does not rule out the (known to be true) possibility that RGC bouton sizes are much larger in boutons with multiple active zones. The correlation of volume and active zone number, by itself, does not resolve the issue. A good argument for multiple boutons might be that the variance is smallest in clusters with 4 active zones (looks like it in the plot) since they would be the average of four active zones to vesicle pool ratios. It is very likely that the multi-active zone vGlut2 clusters represent some clustering and some multi-synaptic boutons. The reference cited by the authors as evidence for the presence of single active zone boutons in young tissue does not rule out the existence of multiple active zone boutons.

      We agree with the reviewer’s comments on the challenges of classifying multi-active zone synapses in STORM images as single terminals versus aggregates of terminals. To help address this, we have performed electron microscopy imaging of genetically labeled RGC axons and identified the existence of single retinogeniculate terminals with multiple active zones. Our EM imaging was limited to 2D sections and does not rule out the clustering of small, single- active zone synapses within 3D volumes. Future volumetric EM reconstructions will be informative for this question. We have significantly updated the figures and text to discuss the new results and provide a careful interpretation of the nature of multi-AZ synapses in STORM imaging data. 

      Several arguments are made that depend on the interpretation of "not statistically significant" (n.s.) meaning that "two groups are the same" instead of "we don't know if they are different". This interpretation is incorrect and materially impacts the conclusions.

      Several arguments are made that interpret statistical significance for one group and a lack of statistical significance for another group meaning that the effect was bigger in the first group. This interpretation is incorrect and materially impacts the conclusions.

      We thank the reviewer for raising these concerns. We have extensively revised the manuscript text to report the data in a more precise way without overinterpreting the results. All references to “N.S.” and associated conclusions have been either removed or substantiated with 5/95% confidence interval testing.

      Result Section 3.

      Claim 1: Complex synapses stabilize simple synapses. There are alternative explanations (mentioned above) for the observed clustering that negate the conclusions. 1) Boutons from the same axon tend to be found near one another. 2) Any form of eye-specific segregation would produce non-random associations in the analysis as performed. The authors compare each observation to a random model, but I cannot determine from the text if the model adequately accounts for alternative explanations.

      We thank the reviewer for their suggestion to consider alternative explanations for our results. We agree that our study does not provide direct molecular mechanistic data demonstrating synaptic stabilization effects. We have significantly revised the manuscript to be more cautious in our interpretations and specifically address alternative biological mechanisms that are consistent with the non-random arrangement of retinogeniculate synapses in our data.

      We agree with the reviewer that individual RGC axons form multiple synapses, however, nascent synapses might not always form close together. If synapses are initially added randomly within RGC axons, eye-specific segregation may conclude with a still-random pattern of dominant-eye inputs. At some later stage, synapses may be selectively refined to produce mature glomeruli. Consistent with this, individual RGCs undergo progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5). One of our goals in this work was to determine if the process of synaptic clustering begins at the earliest stages of synapse formation and, if so, whether it is influenced by retinal wave activity.

      To measure synaptic clustering in our STORM data, we used a randomization of single-AZ synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. Multi-AZ centroid positions were held fixed. Comparing the randomized result to the original distribution, we found a higher fraction of single-AZ synapse associated with multi-AZ synapses, arguing for a non-random clustering effect. However, we agree with the reviewer’s concern that this type of randomization cannot account for the fine scale structure of axons, which we did not have access to in this four-color volumetric super-resolution data set. Thus, there could still be errors in a purely volumetric randomization (e.g. the assignment of synapses to regions in the volume that would not be synaptic locations in the original neuropil), which would effectively decrease the measured degree of clustering after the randomization. To address this, we have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ position with the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, any measured differences in the degree of clustering reflect the synapse type.

      We have updated Figure 3 in the revised manuscript to present the relative clustering index described above. We have updated the results, discussion, and methods sections accordingly.

      The authors claim that specificity increases over time. Figure 3b (middle) shows that the number of synapses near complex synapses might increase with time (needs confidence interval for effect size), but does not show that specificity (original relative to randomized) increases with time. The fact that nearby simple synapse density is always (P2) very different from random suggests a primarily non-activity-dependent explanation. The simplest explanation is that same-side boutons could be from the same axon whereas different-side axons could not be.

      We have significantly revised the analysis and presentation of results in Figure 3 to include a comparative measurement of synaptic clustering between multi-AZ and single-AZ synapses (discussed above). The data presented in the original Figure 3B have been moved to Supplemental Figure 4. Statistical comparisons in Figure S4 between the original and randomized synapse distributions are limited to within-age measurements. Cross-age comparisons were not performed or presented. To address the reviewer’s question concerning CI analysis in the original Figure 3B, we provide Author response image 3 below showing 5/95% confidence intervals for WT mice:

      Author response image 3.

      Claim 2: vGlut2 clusters more than 1.5 um away from multi-active zone vGlut2 clusters are not statistically significantly different in size than vGlut2 clusters within 1.5 um of multi-active zone vGlut2 clusters. Therefore "activity-dependent synapse stabilization mechanisms do not impact simple synapse vesicle pool size". The specific measure of 1.5 um from multi-active zone vGlut2 clusters does not represent all possible synapse stabilization mechanisms.

      We agree with the reviewer that this specific measure does not capture all possible synapse stabilization mechanisms. We have modified the text in the revised manuscript throughout to be more cautious in our data interpretation and have included additional discussion of alternative mechanisms consistent with our results.

      Result Section 4.

      Claim: The proximity of complex synapses with nearby simple synapses to other complex synapses with nearby simple synapses from the same eye is used to argue that activity is responsible for all this clustering.

      It is difficult to derive anything from the quantification besides 'not-random'. That is a problem because we already know that axons from the left and right eye segregate during the period being studied. All the measures in Section 4 are influenced by eye-specific segregation. Given this known bias, demonstrating a non-random relationship (P<X) doesn't mean anything. The test will reveal any non-random spatial relationship between same-eye and opposite-eye synapses.

      The results can be stated as: If you are a contralateral complex synapse, contralateral complex synapses that are also close to contralateral simple synapses will, on average, be slightly closer to you than contralateral complex synapses that are not close to contralateral ipsilateral synapses. That would be true if there is any eye-specific segregation (which there is).

      We appreciate the reviewer’s comments that our anatomical data are consistent with several possible mechanisms, suggesting the need for alternative interpretations of the results. In the original writing, we interpreted our results in the context of activity-dependent mechanisms of like-eye stabilization and opposite-eye competition. However, our results are also consistent with other mechanisms, including non-random molecular specification of eye-specific inputs onto subregions of postsynaptic target cells (e.g. distinct relay neuron dendrites). We have rewritten the manuscript to be more cautious in our interpretations and to provide a balanced discussion of alternative possibilities.

      Regarding the concern that the data in section four are influenced by eye-specific segregation, we previously found synapse density from both eyes is equivalent in the contralateral region at the P4 time point presented (1), which is consistent with binocular axonal overlap at this age. Within our imaging volumes, ipsilateral and contralateral inputs were broadly intermingled throughout the volume, and we did not find evidence for regional segregation with the imaging fields. By these metrics, retraction of ipsilateral inputs from the contralateral territory has not yet occurred.

      It is an overinterpretation of the data to claim that the lack of a clear correlation between vGlut2 cluster volume and distance to vGlut2 clusters with multiple active zones provides support for the claim that "presynaptic protein organization is not influenced by mechanisms governing synaptic clustering".

      We agree with the reviewer that our original language was imprecise in referring to presynaptic protein organization broadly. We have revised this text to present a more accurate description of the results.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye-specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes in this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of presynaptic organization based on Bassoon clustering, the complex and the simple synapse. By analyzing the relative densities and distances between these proteins over age, the authors conclude that the complex synapses promote the clustering of simple synapses nearby to form the future mature glomerular synaptic structure.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises. Using this approach, the authors find that simple synapses cluster close to complex synapses over age, that complex synapse density increases with age.

      Weaknesses:

      From these data, the authors conclude that the complex synapse serves to "promote clustering of like-eye synapses and prohibit synapse clustering from the opposite eye". However, the authors show no causal data to support these ideas. There are a number of issues that the authors should consider:

      (1) Clustering of retinal synapses is in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps simple synapses mature into complex synapses. Simple synapses may also represent ones that are in the process of being eliminated as previously described by Campbell and Shatz, JNeurosci 1992 (consider citing). Can the authors distinguish these scenarios from the ones that they conclude?

      We thank the reviewer for their thoughtful commentary and suggestions to improve our manuscript. We agree with the reviewer that our original interpretation of synaptic clustering by activity-dependent stabilization and punishment mechanisms is not directly supported by causal data. We have extensively revised the manuscript to take a more cautious view of the results and to discuss alternative mechanisms that are consistent with our data.

      During eye-specific circuit development, there is indeed increased synaptogenesis and, ultimately, RGC terminals are closely clustered within synaptic glomeruli. This process involves the selective addition and elimination of synapses. Bouton clustering has been shown to occur within individual RGC axons after eye-opening in the mouse (5). The convergence of other RGC types into clustered boutons has been shown at eye-opening by light and electron microscopy (3). There is also qualitative evidence that synaptic clusters may form earlier during eye-specific segregation in the cat (4). Our data provide additional evidence that synaptic clustering begins prior to eye-opening in the mouse (P2-P8). Although synapse numbers also increase during this period, the distribution of synapse addition is non-random. 

      Single-active zone synapses (we previously called these “simple”) may indeed mature into multi-active zone synapses (we previously called these “complex”). At the same time, single-active zone synapses may be eliminated. We believe that each of these events occurs as part of the synaptic refinement process. Our STORM images are static snapshots of eye-specific refinement, and we cannot infer the dynamic developmental trajectory of an individual synapse in our data. Future live imaging experiments in vivo/in situ will be needed to track the maturation and pruning of individual connections. We have expanded our discussion of these limitations and future directions in the manuscript.

      (2) The argument that "complex" synapses are the aggregate of "simple" synapses (Fig 2, S2) is not convincing.

      We agree with the reviewer’s concern about the ambiguous identity of complex synapses. To clarify the nature of multi-active zone synapses, we have performed RGC-specific dAPEX2 labeling to visualize retinogeniculate terminals by electron microscopy (EM). These experiments revealed the presence of synaptic terminals with multiple active zones. We have added images and text to the results section describing these findings. Our 2D EM images do not rule out the possibility that some multi-active zone synapses observed in STORM images are in fact clusters of individual RGC terminals. We have revised the text to provide a more accurate discussion of the nature of multi-active zone synapses.  

      (3) The authors use of the β2KO mice to assess changes in the organization of synaptic proteins in retinal terminals that have disrupted retinal waves. However, β2-nAChRs are also expressed in the dLGN and other areas of the brain and glutamatergic synapse development has been reported in the CNS independent of the disruption in retinal waves. This issue should be considered when interpreting the total reduced retinal synapse density in the dLGN of the mutant.

      We thank the reviewer for their suggestion to consider non-retinal effects of the germline deletion of the beta 2 subunit of the nicotinic acetylcholine receptor. Previously, Xu and colleagues reported the development of a conditional transgenic mouse model lacking β2-nAChR expression specifically in the retina (6). These retina-specific β2-nAChR mutant mice (Rx-β2cKO) have disrupted retinal wave properties and defects in eye-specific axonal segregation in binocular anterograde tracing experiments. This work suggests that the defects seen in germline β2-nAChR KO mice arise from defects in retinal wave activity rather than the loss of nicotinic receptors elsewhere in the brain. Additionally, the development of brainstem cholinergic inputs to the dLGN is delayed until the closure of the eye-specific segregation period (7), further suggesting a limited role for cholinergic transmission in the retinogeniculate refinement process.

      (4) Outside of a total synapse density difference between WT and β2KO mice, the changes in the spatial organization of synaptic proteins over development do not seem that different. In fact % simple synapses near complex synapses from the non-dominant eye in the mutant is not that different from WT at P8 (Fig 3C), an age when eye-specific segregation is very different between the genotypes. Can the authors explain this discrepancy?

      We thank the reviewer for their question concerning differences between synapse organization in WT versus β2KO mice. In the original presentation of Figure 3C at P4, the percentage of non-dominant eye single-AZ synapses near multi-AZ synapses increased at P4 in WT mice, but this did not occur in β2KO mice. This is consistent with our previous results showing that there is an increase in non-dominant eye synaptic density at this age, which does not occur in β2KO mice (1). At P8, this clustering effect is lost in WT as eye-specific segregation has taken place and non-dominant eye inputs have been eliminated. However, in β2KO mice, the overall synapse density is still low at this age. We interpret this result as a failure of synaptogenesis in the β2KO line, which leads to increased growth of individual RGC axons (8) and eye-specific overlap at P8 (9, 10). Evidence in support of this interpretation comes from live dynamic imaging studies of RGC axon branching in Xenopus and Zebrafish, showing that synapse formation stabilizes local axon branching and that disruptions of synapse formation or neurotransmission lead to enlarged axons (11-13).

      Our anatomical results do not provide a specific biological mechanism for the remaining clustering observed in the β2KO mice. We have revised our discussion of the fact that individual RGC axons may form multiple synaptic connections leading to clustering, which may be independent of changes in retinal wave properties in the β2KO mouse. We have also extensively revised the analysis and presentation of results in Figure 3 to directly compare synaptic clustering around both multi-AZ synapses and single-AZ synapses within the same imaging volumes.

      (5) The authors use nomenclature that has been previously used and associated with other aspects of retinogeniculate properties. For example, the phrases "simple" and "complex" synapses have been used to describe single boutons or aggregates of boutons from numerous retinal axons, whereas in this manuscript the phrases are used to describe vesicle clusters/release sites with no knowledge of whether they are from single or multiple boutons. Likewise, the use of the word "glomerulus" has been used in the context of the retinogeniculate synapse to refer to a specific pattern of bouton aggregates that involves inhibitory and neuromodulatory inputs. It is not clear how the release sites described by the authors fit in this picture. Finally the use of the word "punishment" is associated with a body of literature regarding the immune system and retinogeniculate refinement-which is not addressed in this study. This double use of the phrases can lead to confusion in the field and should be clarified by clear definitions of how they are used in the current study.

      We appreciate the reviewer’s concern that the terminology we used in the initial submission may cause confusion. We have revised the text throughout for clarity. “Simple” synapses are now referred to as “single-active zone synapses”. “Complex” synapses are now referred to as “multi-active zone synapses”. We have removed all text that previously referred to synaptic clusters in STORM images as glomeruli. We agree that we have not provided causal evidence for synaptic stabilization and punishment mechanisms, which would require additional molecular genetic studies. We have restructured the manuscript to remove these references and discuss our anatomical results impartially.  

      Reviewer #3 (Public Review):

      This manuscript is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label a number of active zones, and anti-Homer to identify postsynaptic densities. In their previous study, they compared the detailed synaptic structure across the development of synapses made with contra-projecting vs ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new analysis on the same data set in which they classify synapses into "complex" vs. "simple" and assess the number and spacing of these synapses. From these measurements, they make conclusions regarding the processes that lead to synapse competition/stabilization.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is also a plus.

      Weaknesses:

      The lack of details provided for the classification scheme as well as the interpretation of small effect sizes limit the interpretations that can be made based on these findings.

      We thank the reviewer for their reading of the manuscript and helpful comments to improve the work. We provide details on how single-active zone and multi-active zone synapses are classified in the methods section. We agree with the suggestion to be more careful in interpreting the results. We have extensively revised the manuscript to 1) include additional electron microscopy data demonstrating the presence of multi-active zone retinogeniculate synapses, 2) extend the synaptic clustering analysis to both single-active zone and multi-active zone synapses for comparison, and 3) improve the clarity and accuracy of the discussion throughout the manuscript.

      (1) The criteria to classify synapses as simple vs. complex is critical for all of the analysis in this study. Therefore this criteria for classification should be much more explicit and tested for robustness. As stated in the methods, it is based on the number of active zones which are designated by the number of Bassoon clusters associated with a Vglut2 cluster (line 697). A second part of the criteria is the size of the presynaptic terminal as assayed by "greater Vglut2 signal" (line 116). So how are these thresholds determined? For Bassoon clusters, is one voxel sufficient? Two? If it's one, how often do they see a Bassoon positive voxel with no Vglut2 cluster and therefore may represent "noise"? There is no distribution of Bassoon volumes that is provided that might be the basis for selecting this number of sites. Unfortunately, the images are not helpful. For example, does P8 WT in Figure 1B have 7 or 2? According to Figure 2C, it appears the numbers are closer to 2-4.

      The Vglut volume measurements also do not seem to provide a clear criterion. Figure 2 shows that the distributions of Vglut2 cluster volumes for complex and for simple synapses are significantly overlapping.

      The authors need to clarify the quantitative approach used for this classification strategy and test how sensitive the results of the study are to how robust this strategy is

      We thank the reviewer for their question concerning the STORM data analysis. Here we provide a brief overview of the complete analysis details, which are provided in the methods section.

      Our raw STORM data sets consisted of spectrally separate volumetric imaging channels of VGluT2, Bassoon, and Homer1 signals. For each of these channels, raw STORM data were processed by 1) application of the corresponding low-resolution conventional image of each physical section to the STORM data to filter artifacts in the STORM image which do not appear in the conventional image, 2) STORM images are then thresholded using a 2-factor Otsu threshold that removes low-intensity background noise while preserving all single-molecule localizations that correspond to genuine antibody labeling as well as non-specific antibody labeling in the tissue, 3) application of the MATLAB function “conncomp” to identify connected component voxel in 3D across the image stack. Clusters are only kept for further analysis steps if they are connected across at least 2 continuous physical sections (140 nm Z depth). 4) for every connected component (clusters corresponding to genuine antibody labeling and background labeling), we measure the volume and signal density (intensity/volume) for every cluster in the dataset, 5) a threshold is applied to retain clusters that have a higher volume and lower signal density. We exclude signals that have low-volume and high-density, which correspond to single antibody labels. This analysis retains larger clusters that correspond to synaptic objects and excludes non-specific antibody background. 

      The average size of WT synaptic Bassoon clusters ranges from 55 - 3532 voxels (0.00092~0.059 μm<sup>3</sup>), with a median size of 460 voxels (0.0077 μm<sup>3</sup>).

      The average size of WT synaptic VGluT2 clusters ranges from 50 -73752 voxels (0.00084~1.2 μm<sup>3</sup>), with a median size of 980 voxels (0.016 μm<sup>3</sup>).

      The average size of WT synaptic Homer1 clusters ranges from 63-7118 (0.0010~0.12 μm3), with a median size of 654 voxels (0.011 μm<sup>3</sup>).

      In practice, any Bassoon/VGluT2/Homer1 clusters with <10 voxels are immediately filtered at the Otsu thresholding step (2) above.

      The reviewer is correct that we often see Bassoon(+) clusters that are not associated with VGluT2, and these may reflect synapses of non-retinal origin or retinogeniculate synapses that lack VGluT2 expression. To identify retinogeniculate synapses containing VGluT2, we performed a synapse pairing analysis that measured the association between VGluT2 and Bassoon clusters after the synapse cluster filtering described above. We first measured the centroid-centroid distance from each VGluT2 cluster to the closest cluster in the Bassoon channel. We next quantified the signal intensity of the Bassoon channel within a 140 nm shell surrounding each VGluT2 cluster. A 2D histogram was plotted based on the measured centroid-centroid distances and opposing channel signal densities of each cluster. Paired clusters with closely positioned centroids and high intensities of apposed channel signal were identified using the OPTICS algorithm (14).

      In the original Figure 1B, the multi-active zone synapse in WT at P8 had two Bassoon clusters. To clarify this, we have revised the images in Figure 1 to include arrowheads that point to individual active zones. We have also revised Supplemental Figure 1 to show volumetric renderings of individual example synapses that help illustrate the 3D structure of these multi-active zone inputs. All details about synapse analysis and synapse pairing are provided in the methods section.

      (2) Effect sizes are quite small and all comparisons are made on medians of distributions. This leads to an n=3 biological replicates for all comparisons. Hence this small n may lead to significant results based on ANOVAS/t-tests, but the statistical power of these effects is quite weak. To accurately represent the variance in their data, the authors should show all three data points for each category (with a SD error bar when possible). They should also include the number of synapses in each category (e.g. the numerators in Figure 1D and the denominators for Figure 1E). For other figures, there are additional statistical questions described below.

      We thank the reviewer for their suggestion to improve the presentation of our results. We have added all three data points (individual biological replicates) to each figure plot when applicable. We have also included a supplemental table (Table S1) listing total eye-specific synapse numbers of each type (mAZ and sAZ) and AZ number for each biological replicate in both genotypes.

      (3) The authors need to add a caveat regarding their classification of synapses as "complex" vs. "simple" since this is a terminology that already exists in the field and it is not clear that these STORM images are measuring the same thing. For example, in EM studies, "complex" refers to multiple RGCs converging on the same single postsynaptic site. The authors here acknowledge that they cannot assign different AZs to different RGCs so this comparison is an assumption. In Figure 2 they argue this is a good assumption based on the finding that the Vglut column/active zone is constant and therefore each represents a single RGC. However, the authors should acknowledge that they are actually seeing quite different percentages than those in EM studies. For example, in Monavarfeshani et al, eLife 2018, there were no complex synapses found at P8. (Note this study also found many more complex vs. simple synapses in the adult - 70% vs. the 20% found in the current study - but this difference could be a developmental effect). In the future, the authors may want to take another data set in the adult dLGN to make a direct comparison based on numbers and see if their classification method for complex/simple maps onto the one that currently exists in the literature.

      We appreciate the reviewer’s comment that the use of the terms “complex” and “simple” may cause confusion. We have significantly revised the manuscript for clarity: 1) we now refer to “complex” synapses as “multi-active zone synapses” and “simple” synapses as “single-active zone synapses. 2) We have performed electron microscopy analysis of dAPEX2-labeled retinogeniculate projections to confirm the existence of large synaptic terminals with multiple active zones. 3) We have expanded our discussion of previous electron microscopy results describing a lack of axonal convergence at P8 (3). 4) We have added a discussion on how individual RGCs may form multiple synapses in close proximity within their axonal arbor, which would create a clustering effect.

      We agree that it will be informative to collect a STORM data set in the adult mouse dLGN and we look forward to working on this project to compare with EM results in the future.  

      (4) Figure 3 assays the relative distribution of simple vs. complex synapses. They found that a larger percentage of simple synapses were within 1.5 microns of complex synapses than you would expect by chance for both ipsi and contra projecting RGCs, and hence conclude that complex synapses are sites of synaptic clustering. In contrast, there was no clustering of ipsi-simple to contra-complex synapses and vice versa. The authors also argue that this clustering decreases between P4 and P8 for ipsi projecting RGCs.

      This analysis needs much more rigor before any conclusions can be drawn. First, the authors need to justify the 1.5-micron criteria for clustering and how robust their results are to variations in this distance. Second, these age effects need to be tested for statistical significance with an ANOVA (all the stats presented are pairwise comparisons to means expected by random distributions at each age). Finally, the authors should consider what n's to use here - is it still grouped by biological replicate? Why not use individual synapses across mice? If they do biological replicates, then they should again show error bars for each data point in their biological replicates. And they should include the number of synapses that went into these measurements in the caption.

      We appreciate the suggestion to improve the rigor of our analysis of synaptic clustering presented in Figure 3. We have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ synapses and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ positions within the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, the measured differences in the degree of clustering reflect a synapse type-specific effect.

      We have also updated Supplemental Figure 3 showing the results of varying the search radius from 1-4 μm for both contralateral- and ipsilateral-eye synapses. The results showed that a search radius of 1.5 μm resulted in the largest difference between the original synapse distribution and a randomized synapse distribution (shuffling of single-active zone synapse position while holding multi-active zone synapse position fixed).

      Finally, we have removed all statistical comparisons of single measurements (means or ratios) across ages from the manuscript. We focus our statistical analysis on paired data comparisons within individual biological replicates.

      For the analysis of synapse clustering, we grouped the data by biological replicates (N=3) to look for a global effect on synapse clustering. In the revised manuscript, we added data points for each replicate in the figure and included the number of synapses in Supplementary Table 1.

      (5) Line 211-212 - the authors conclude that the absence of clustered ipsi-simple synapses indicates a failure to stabilize (Figure 3). Yet, the link between this measurement and synapse stabilization is not clear. In particular, the conclusion that "isolated" synapses are the ones that will be eliminated seems to be countered by their finding in Figure 3D/E which shows that there is no difference in vesicle pool volume between near and far synapses. If isolated synapses are indeed the ones that fail to stabilize by P8, wouldn't you expect them to be weaker/have fewer vesicles? Also, it's hard to tell if there is an age-dependent effect since the data presented in Figures 3D/E are merged across ages.

      We thank the reviewer for their suggestion to clarify the results in Figure 3. Based on the measured eye-specific differences in vesicle pool size and organization, we also expected that synapses outside of clusters would show a reduced vesicle population. However, across all ages, we found no differences in the vesicle pool size of single-active zone synapses based on their proximity to multi-active zone synapses. Below, we show cumulative distributions of these results across all ages (P2/P4/P8) for WT mice CTB(+) data. Statistical tests (Kolmogorov-Smirnov tests) show no significant differences. P = 0.880, 0.767, 0.494 respectively. Separate 5/95% confidence interval calculations showed overlap between far and near populations at each age.

      Author response image 4.

      To clarify the presentation of the results, we have changed the text to state that the “vesicle pool size of sAZ synapses is independent of their distance to mAZ synapses”. We have removed references to stabilization and punishment from the results section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Because none of the phenomena being measured can be expected to behave randomly (given what is already known about the system) and the sample size is small, I believe quantification of the data requires confidence intervals for effect sizes. Resolving the multi-bouton vs multi-active zone bouton with EM would also help.

      We thank the reviewer for their thorough reading of the manuscript and many helpful suggestions. We provide analysis with confidence intervals in a point-by-point response below. In the manuscript we revised our results and focused our statistical analyses on comparisons within the same biological replicate (paired effects). In addition, we have performed electron microscopy of RGC inputs to the dLGN at postnatal day 8 to demonstrate the presence of retinogeniculate synapses with multiple active zones.

      Figure 1:

      Please show data points in scatter bar plots and not just error bars.

      We have updated all plots to show data points for independent biological replicates.

      Please describe the image processing in more detail and provide an image in which the degree of off-target labeling can be evaluated.

      We have updated the description of the image processing in the methods sections. We have made all the code used in this analysis freely available on GitHub (https://github.com/SpeerLab). We have uploaded the raw STORM images of the full data set to the open-access Brain Imaging Library (16). These images can be accessed here: https://api.brainimagelibrary.org/web/view?bildid=ace-dud-lid (WTP2A data for example). All 18 datasets are currently searchable on the BIL by keyword “dLGN” or PI last name “Speer” and a DOI for the grouped dataset is pending.

      How does panel 1D get very small error bars with N = 3? Please provide scatter plots.

      We have updated panel 1D to show the means for each independent biological replicate.

      Line 129: over what volume is density measured? What are the n's? What is the magnitude (with confidence intervals) of increase?

      The volume we collected from each replicate was ~80μm*80μm*7μm (total volume ~44,800 μm3). N=3 biological replicates for each age, genotype, and tissue location. Because of concerns with the use of ANOVA for low sample numbers, we have removed a majority of the age-wise comparisons from the manuscript and instead focus on within-replicate paired data comparisons. Author response image 5 showa 5/95% confidence intervals for WT data (left panel) and β2KO data (right panel) is shown below:

      Author response image 5.

      The 5/95% CI range for the increase in synapse density from P2 to P8 for CTB(+) synapses is ~ -0.001 ~ 0.037 synapses / μm<sup>3</sup>.

      Line 131: You say that non-dominant increases and then decreases. It appears that the error bars argue that you do not have enough information to reliably determine how much or little density changes.

      Line 140: No confidence intervals. It appears the error bars allow both for the claimed effect of increased fraction and the opposite effect of decreased density.

      Because of concerns with the use of ANOVA for low sample numbers, we have removed age-wise comparisons of single-measurements (means and ratios) from the manuscript and instead focus on within-replicate paired data comparisons.

      Line 144: Confidence intervals would be a reasonable way to argue that fraction is not changed in KO: normal fraction XX%-XX%. KO fraction XX%-XX%.

      Author response image 6 shows panels for WT (left) and β2KO mice (right) with 5/95% CIs.

      Author response image 6.

      In the revised manuscript, we have updated the text to report the measurements, but we do not draw conclusions about changes over development.

      I find it hard to estimate magnitudes on a log scale.

      We appreciate the reviewer’s concern with the presentation of results on a log scale. Because the measured synapse properties are distributed logarithmically, we have elected to present the data on a log scale so that the distribution(s) can be seen clearly. Lognormal distributions enable us to use a mixed linear model for statistical analysis.

      Line 156: Needs confidence interval for difference.

      Line 158: Needs confidence interval for difference of differences.

      Line 160: Needs confidence interval for difference of differences.

      Why only compare at P4 where there is the biggest difference? The activity hypothesis would predict an even bigger effect at P8.

      Below is a table listing the mean volume (log10μm3) and [5/95%] confidence intervals for comparisons of VGluT2 signal between CTB(+) and CTB(-) synapses from Figure 2A and 2B:

      Author response table 2.

      Based on the values given above, the mean difference of differences and [5/95%] confidence intervals are listed below:

      Author response table 3.

      We added these values to the manuscript. We have also reported the difference in median values on a linear scale (as below) so that the readers can have a straightforward understanding of the magnitude.

      Author response table 4.

      We elected to highlight the results at P4 based on our previous finding that the synapse density from each eye-of-origin is similar at this time point (1).

      At P8, there is a decrease in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to an increase in VGluT2 volume within non-dominant eye synapses that survive competition between P4-P8.

      At P8 in the mutant, there is an increase in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to delayed synaptic maturation in β2KO mice.

      Line 171: The correct statistical comparison was not performed for the claim. Lack of * at P2 does not mean they are the same. Why do you get the same result for KO?

      We have revised the statistical analysis, figure presentation, and text to remove discussion of changes in the number of active zones per synapse over development based on ANOVA. We now report eye-specific differences at each time point using paired T-test analysis, which is mathematically equivalent to comparing the 5/95% confidence interval in the difference.

      Line 175: Qualitative claim. Correlation coefficients and magnitudes of correlation coefficients are not reported.

      Linear fitting slop and R square values are attached:

      Author response table 5.

      The values are added to the manuscript to support the conclusions.

      Line 177: n.s. does not mean that you have demonstrated the values are the same. An argument for similarity could be made by calculating a confidence interval a for potential range of differences. Example: Complex were 60%-170% of Simple.

      Author response image 7 with 5/95% CI is shown below (WT and B2KO):

      Author response image 7.

      Comparing the difference between multi-AZ synapse and single-AZ synapse revealed that the difference in average VGluT2 cluster volume per AZ is:

      Author response table 6.

      The values are added to the manuscript for discussion.

      Line 178: There is no reason to think that the vesical pool for a single bouton does not scale with active zone number within the range of uncertainty presented here.

      We have collected EM images of multi-AZ zone synapses and modified our discussion and conclusions in the revised text.

      Line 196: "non-random clustering increased progressively" is misleading. The density of the boutons increases for both the Original and Randomized. Given the increase in variance at P8, it is unlikely that the data supports the claim that the non-randomness increased. Would be easy to quantify with confidence intervals for a measure of specificity (O/R).

      We have revised the manuscript to remove analysis and discussion of changes in clustering over development. We have modified this section of the manuscript and figures to present a normalized clustering index that describes the non-random clustering effect present at each time point.

      Line 209: Evidence is for correlation, not causation and there is a trivial potential explanation for correlation.

      We appreciate the reviewer’s concern with over interpretation of the results. We have changed the text to more accurately reflect the data.

      Line 238:239: Authors failed to show effect is activity-dependent. Near/Far distinction is not necessarily a criterion for the effect of activity. The claim is likely false in other systems.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to more accurately reflect the data. 

      Line 265-266: Assumes previous result is correct and measure of vGlut2 provides information about all presynaptic protein organization.

      We thank the reviewer for pointing out the incorrect reference to all presynaptic protein organization. We have corrected the text to reference only the VGluT2 and Bassoon signals that were measured.

      Line 276: There are many other interpretations that include trivial causes. It is unclear what the measure indicates about the biology and there is no interpretable magnitude of effect.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to remove references to mechanisms of synaptic stabilization.

      Line 289: Differences cannot be demonstrated by comparing P-values. Try comparing confidence intervals for effect size or generate a confidence interval for the difference between the two groups.

      5/95% confidence intervals are given below for Figure 4C/D:

      Author response table 7.

      We have added these values to the manuscript to support our conclusion.

      Line 305: "This suggests that complex synapses from the non-dominant-eye do not exert a punishment effect on synapses from the dominant-eye" Even if all the other assumptions in this claim were true, "n.s." just means you don't know something. It cannot be compared with an asterisk to claim a lack of effect.

      We thank the reviewer for raising this concern. We have modified the text to remove references to synaptic punishment mechanisms in the results section.

      Below are the 5/95% confidence intervals for the results in Figure 4F:

      Author response table 8.

      We have added these values to the manuscript to support our conclusion.

      Line 308: "mechanisms that act locally". 6 microns is introduced based on differences in curves above(?). I don't see any analysis that would argue that longer-distance effects were not present.

      The original reference referred to the differences in the cumulative distribution measurements between multi-active zone synapses versus single-active zone synapses in their distance to the nearest neighboring multi-active zone synapse. For clarity, we have deleted the reference to the 6 micron distance in the revised text.

      Reviewer #2 (Recommendations For The Authors):

      (1) This data set would be valuable to the community. However, unless the authors can show experiments that manipulate the presence of complex synapses to test their concluding claims, the manuscript should be rewritten with a reassessment of the conclusions that is more grounded in the data.

      We thank the reviewer for their careful reading of the manuscript and we agree the original interpretations were not causally supported by the experimental results. We have made substantial changes to the text throughout the introduction, results, and discussion sections so that the conclusions accurately reflect the data.

      (2) To convincingly address the claim that "complex synapse" are aggregates of simple synapses, the authors should perform experiments at the EM level showing what the bouton correlates are to these synapses.

      We thank the reviewer for their suggestion to perform EM to gain a better understanding of retinogeniculate terminal structure. We generated an RGC-specific transgenic line expressing the EM reporter dAPEX2 localized to mitochondria. We have collected EM images of retinogeniculate terminals that demonstrate the presence of multiple active zones within individual synapses. These results are now presented in Figure 1. The text has been updated to reflect the new results.

      (3) Experiments using the conditional β2KO mice would help address questions of the contribution of β2-nAChRs in dLGN to the synaptic phenotype.

      We appreciate the reviewer’s concern that the germline β2KO model may show effects that are not retina-specific. To address this, Xu and colleagues generated a retina-specific conditional β2KO transgenic and characterized wave properties and defective eye-specific segregation at the level of bulk axonal tracing (6). The results from the conditional mutant study suggest that the main effects on eye-specific axon refinement in the germline β2KO model are likely of retinal origin through impacts on retinal wave activity. Additionally, anatomical data shows that brainstem cholinergic axons innervate the dLGN toward the second half of eye-specific segregation and are not fully mature at P8 when eye-specific refinement is largely complete (7). We agree with the reviewer that future synaptic studies of previously published wave mutants, including the conditional reporter line, would be needed to conclusively assess a contribution of non-retinal nAChRs. These experiments will take significant time and resources and we respectfully suggest this is beyond the scope of the current manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to be more transparent that they are using the same data set from the previous publication (right now it does not appear until line 471) and clarify what was found in that study vs what is being tested here.

      We thank the reviewer for their thoughtful reading of the manuscript and helpful recommendations to improve the clarity of the work. We have edited the text to make it clear that this study is a reanalysis of an existing data set. We have revised the text to discuss the results from our previous study and more clearly define how the current analysis builds upon that initial work. 

      (2) The authors restricted their competition argument in Figure 4 to complex synapses, but why not include the simple ones? This seems like a straightforward analysis to do.

      We appreciate the reviewer’s suggestion to measure spatial relationships between “clustered” and “isolated” single-AZ synapses as we have done for multi-AZ synapses in Figure 4. However, we are not able to perform a direct and interpretable comparison with the results shown for multi-AZ synapses. First, we would need to classify “clustered” and “isolated” single-AZ synapses. This classification convolves two effects: 1) a distance threshold to define clustering and 2) subsequent distance measurements between clustered synapses.

      If we apply an equivalent 1.5 μm distance threshold (or any other threshold) to define clustered synapses, the distance from each “clustered” single-AZ synapse to the nearest other single-AZ synapse will always be smaller than the defined threshold (1.5 μm). Alternatively, if all of the single-AZ synapses within each local 1.5 μm shell are excluded from the subsequent intersynaptic distance measurements, this will set a hard lower boundary on the distance between synaptic clusters (1.5 μm minimum). The two effects discussed above were separated in our original analysis of multi-AZ synapses defined as “clustered” and “isolated” based on their relationship to single-AZ synapses, but these effects cannot be separated when analyzing single-AZ distributions alone.

      (3) The Discussion seems much too long and speculative from the current data that is represented - particularly without verification of complex synapses actually being inputs from different RGCs. Along the same lines, figure captions are misleading. For example, for Figure 4 - the title indicates that the complex synapses are driving the rearrangements. But of course, these are static images. The authors should use titles that are more reflective of their findings rather than this interpretation.

      We thank the reviewer for these helpful suggestions. We have changed each of the figure captions to more accurately reflect the results. We have deleted all of the speculative discussion and revised the remaining text to improve the accuracy of the presentation.

      (4) In the future, the authors may want to consider an analysis as to whether ipsi and contra projection contribute to the same synapses

      We agree with the reviewer that it is of interest to investigate the contribution of binocular inputs to retinogeniculate synaptic clusters during development. At maturity, some weak binocular input remains in the dominant-eye territory (15). To look for evidence of binocular synaptic interactions, we measured the percentage of the total small single-active zone synapses that were within 1.5 micrometers of larger multi-active zone synapses of the opposite eye. On average, ~10% or less of the single-active zone synapses were near multi-active zone synapses of the opposite eye. This analysis is presented in Supplemental Figure S3C/D.

      It is possible that some large mAZ synapses might reflect the convergence of two or more smaller inputs from the two eyes. Our current analyses do not rule this out. However, previous EM studies have found limited evidence for convergence of multiple RGCs (3) at P8 and our own EM images show that larger terminals with multiple active zones are formed by a single RGC bouton. Future volumetric EM reconstructions with eye-specific labels will be informative to address this question.

      References

      (1) Zhang C, Yadav S, Speer CM. The synaptic basis of activity-dependent eye-specific competition. Cell Rep. 2023;42(2):112085.

      (2) Bickford ME, Slusarczyk A, Dilger EK, Krahe TE, Kucuk C, Guido W. Synaptic development of the mouse dorsal lateral geniculate nucleus. J Comp Neurol. 2010;518(5):622-35.

      (3)Monavarfeshani A, Stanton G, Van Name J, Su K, Mills WA, 3rd, Swilling K, et al. LRRTM1 underlies synaptic convergence in visual thalamus. Elife. 2018;7.

      (4) Campbell G, Shatz CJ. Synapses formed by identified retinogeniculate axons during the segregation of eye input. J Neurosci. 1992;12(5):1847-58.

      (5) Hong YK, Park S, Litvina EY, Morales J, Sanes JR, Chen C. Refinement of the retinogeniculate synapse by bouton clustering. Neuron. 2014;84(2):332-9.

      (6) Xu HP, Burbridge TJ, Chen MG, Ge X, Zhang Y, Zhou ZJ, et al. Spatial pattern of spontaneous retinal waves instructs retinotopic map refinement more than activity frequency. Dev Neurobiol. 2015;75(6):621-40.

      (7) Sokhadze G, Seabrook TA, Guido W. The absence of retinal input disrupts the development of cholinergic brainstem projections in the mouse dorsal lateral geniculate nucleus. Neural Dev. 2018;13(1):27.

      (8) Dhande OS, Hua EW, Guh E, Yeh J, Bhatt S, Zhang Y, et al. Development of single retinofugal axon arbors in normal and beta2 knock-out mice. J Neurosci. 2011;31(9):3384-99.

      (9) Rossi FM, Pizzorusso T, Porciatti V, Marubio LM, Maffei L, Changeux JP. Requirement of the nicotinic acetylcholine receptor beta 2 subunit for the anatomical and functional development of the visual system. Proc Natl Acad Sci U S A. 2001;98(11):6453-8.

      (10) Muir-Robinson G, Hwang BJ, Feller MB. Retinogeniculate axons undergo eye-specific segregation in the absence of eye-specific layers. J Neurosci. 2002;22(13):5259-64.

      (11) Fredj NB, Hammond S, Otsuna H, Chien C-B, Burrone J, Meyer MP. Synaptic Activity and Activity-Dependent Competition Regulates Axon Arbor Maturation, Growth Arrest, and Territory in the Retinotectal Projection. J Neurosci. 2010;30(32):10939.

      (12) Hua JY, Smear MC, Baier H, Smith SJ. Regulation of axon growth in vivo by activity-based competition. Nature. 2005;434(7036):1022-6.

      (13) Rahman TN, Munz M, Kutsarova E, Bilash OM, Ruthazer ES. Stentian structural plasticity in the developing visual system. Proc Natl Acad Sci U S A. 2020;117(20):10636-8.

      (14) Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 1999;28(2):49–60.

      (15) Bauer J, Weiler S, Fernholz MHP, Laubender D, Scheuss V, Hübener M, et al. Limited functional convergence of eye-specific inputs in the retinogeniculate pathway of the mouse. Neuron. 2021;109(15):2457-68.e12.

      (16) Benninger K, Hood G, Simmel D, Tuite L, Wetzel A, Ropelewski A, et al. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research.  Practice and Experience in Advanced Research Computing; Portland, OR, USA: Association for Computing Machinery; 2020. p. 1–7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Though the Norrin protein is structurally unrelated to the Wnt ligands, it can activate the Wnt/βcatenin pathway by binding to the canonical Wnt receptors Fzd4 and Lrp5/6, as well as the tetraspanin Tspan12 co-receptor. Understanding the biochemical mechanisms by which Norrin engages Tspan12 to initiate signaling is important, as this pathway plays an important role in regulating retinal angiogenesis and maintaining the blood-retina-barrier. Numerous mutations in this signaling pathway have also been found in human patients with ocular diseases. The overarching goal of the study is to define the biochemical mechanisms by which Tspan12 mediates Norrin signaling. Using purified Tspan12 reconstituted in lipid nanodiscs, the authors conducted detailed binding experiments to document the direct, high-affinity interactions between purified Tspan12 and Norrin. To further model this binding event, they used AlphaFold to dock Norrin and Tspan12 and identified four putative binding sites. They went on to validate these sites through mutagenesis experiments. Using the information obtained from the AlphaFold modeling and through additional binding competition experiments, it was further demonstrated that Tspan12 and Fzd4 can bind Norrin simultaneously, but Tspan12 binding to Norrin is competitive with other known co-receptors, such as HSPGs and Lrp5/6. Collectively, the authors proposed that the main function of Tspan12 is to capture low concentrations of Norrin at the early stage of signaling, and then "hand over" Norrin to Fzd4 and Lrp5/6 for further signal propagation. Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Strengths: 

      • Biochemical reconstitution of Tspan12 and Fzd4 in lipid nanodiscs is an elegant approach for testing the direct binding interaction between Norrin and its co-receptors. The proteins used for the study seem to be of high purity and quality. 

      • The various binding experiments presented throughout the study were carried out rigorously. In particular, BLI allows accurate measurement of equilibrium binding constants as well as on and off rates. 

      • It is nice to see that the authors followed up on their AlphaFold modeling with an extensive series of mutagenesis studies to experimentally validate the potential binding sites. This adds credence to the AlphaFold models. 

      • Table S1 is a further testament to the rigor of the study. 

      • Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Suggestions for improvement: 

      • It would be helpful to show Coomassie-stained gels of the key mutant Norrin and Tspan12 proteins presented in Figures 2E and 2F. 

      We have included Stain-Free SDS-PAGE gels from the purification of the Norrin and Tspan12 mutants in a new Figure S4.

      • Many Norrin and Tspan12 mutations have been identified in human patients with FEVR. It would be interesting to comment on whether any of the mutations might affect the NorrinTspan12 binding sites described in this study. 

      Thank you for this suggestion. We have inspected human mutation databases gnomAD, ClinVar, and HGMD for known mutations in the predicted Tspan12-Norrin binding interface and their occurrence in human patients with FEVR or Norrie disease.

      While a number of Tspan12 residues that we predict to interact with Norrin are impacted by rare mutations in humans (e.g., L169M, E170V, E173K, D175N, E196G, S199C, as found in the gnomAD database), these alleles are of unknown clinical significance (as found in ClinVar or HGMD databases). It is possible that mutations that slightly weaken the Norrin-Tspan12 interface may not produce a strong phenotype, especially given the avidity we expect from this system. By our examination, the missense variants of clinical significance that have been found in the Tspan12 LEL would be expected to destabilize the protein (i.e., mutations to or from cysteine or proline, or mutations to residues involved in packing interactions within the LEL fold), and therefore these mutations may produce a disease phenotype by impacting Tspan12 protein expression levels.  

      Several Norrin mutations that are associated with Norrie disease, FEVR, or other diseases of the retinal vasculature have been found in the predicted Tspan12 binding site. For example, Norrin mutations at positions L103 (L103Q, L103V), K104 (K104N, K104Q), and A105 (A105T, A105P, A105E, A105S, A105T, A105V) have been found in patients, all of which may disrupt binding to Tspan12. However, the deleterious effect of K104 mutations on Norrin-stimulated signaling could also be explained by a weakened Norrin-Fzd4 binding interface. Norrin mutations at R115 (R115L and R115Q), as well as R121 (R121L, R121G, R121Q, and R121W) have also been found in patients with various diseases of the retinal vasculature. Additionally, the Norrin mutation T119P has been found in patients with Norrie disease, but we would expect this mutation to destabilize Norrin in addition to disrupting the Tspan12 binding site. 

      While we commented briefly on mutations R115L and R121W in the original draft (page 5, paragraphs 4 and 1, respectively), we have updated the manuscript with more comments on disease-associated mutations to the predicted Tspan12 binding site on Norrin (page 5, first partial paragraph; page 9, first partial paragraph). 

      • Some of the negative conclusions (e.g. the lack of involvement of Tspan12 in the formation of the Norrin-Lrp5/6-Fzd4-Dvl signaling complex) can be difficult to interpret. There are many possible reasons as to why certain biological effects are not recapitulated in a reconstitution experiment. For instance, the recombinant proteins used in the experiment may not be presented in the correct configurations, and certain biochemical modifications, such as phosphorylation, may also be missing. 

      We agree that different Tspan12 and Fzd4 stoichiometries, lipid compositions, and posttranslational modifications could impact the results of our study, and that it is important to mention these possibilities. We have added these caveats to the discussion section (page 10, last paragraph).  

      Reviewer #2 (Public Review): 

      This is an interesting study of high quality with important and novel findings. Bruguera et al. report a biochemical and structural analysis of the Tspan12 co-receptor for norrin. Major findings are that Norrin directly binds Tspan12 with high affinity (this is consistent with a report on BioRxiv: Antibody Display of cell surface receptor Tetraspanin12 and SARS-CoV-2 spike protein) and a predicted structure of Tspan12 alone or in complex with Norrin. The

      Norrin/Tspan12 binding interface is largely verified by mutational analysis. An interaction of the Tspan12 large extracellular loop (LEL) with Fzd4 cannot be detected and interactions of fulllength Tspan12 and Fzd4 cannot be tested using nano-disc based BLI, however, Fzd4/Tspan12 heterodimers can be purified and inserted into nanodiscs when aided by split GFP tags. An analysis of a potential composite binding site of a Fzd4/Tspan12 complex is somewhat inconclusive, as no major increase in affinity is detected for the complex compared to the individual components. A caveat to this data is that affinity measurements were performed for complexes with approximately 1 molecule Tspan12 and FZD4 per nanodisc, while the composite binding site could potentially be formed only in higher order complexes, e.g., 2:2 Fzd4/Tspan12 complexes. Interestingly, the authors find that the Norrin/Tspan12 binding site and the Norrin/Lrp6 binding site partially overlap and that the Lrp6 ectodomain competes with Tspan12 for Norrin binding. This result leads the authors to propose a model according to which Tspan12 captures Norrin and then has to "hand it off" to allow for Fzd4/Lrp6 formation. By increasing the local concentration of Norrin, Tspan12 would enhance the formation of the Fzd4/Lrp5 or Fzd4/Lrp6 complex. 

      Thank you for pointing out the BioRxiv report showing Norrin-Tspan12 LEL binding. We have cited this in the introduction of our revised manuscript (page 2, paragraph 3).

      The experiments based on membrane proteins inserted into nano-discs and the structure prediction using AlphaFold yield important new insights into a protein complex that has critical roles in normal CNS vascular biology, retinal vascular disease, and is a target for therapeutic intervention. However, it remains unclear how Norrin would be "handed off" from Tspan12 or Tspan12/Fzd4 complexes to Fzd4/Lrp6 complexes, as the relatively high affinity of Norrin to Fzd4/Tspan12 dimers likely does not favor the "handing off" to Fzd4/Lrp6 complexes. 

      While the Fzd4-Tspan12 interaction is strong, our data suggest that Fzd4 and Tspan12 bind Norrin with negative cooperativity, suggesting that Fzd4 binding may enhance Norrin-Tspan12 dissociation to facilitate handoff. This model is based on 1) the dissociation of Norrin from beadbound Tspan12 in the presence of saturating Fzd4 CRD (Figure 3D), and 2) a weaker measured affinity of Norrin-Tspan12LEL in the presence of saturating Fzd4 CRD (Figure 3F). We have now added wording to emphasize this in the discussion section (page 9, end of first full paragraph).

      However, as you note, the Norrin-Tspan12 affinity that we measured in the presence of Fzd CRD (tens of nM) is still much stronger than the known Norrin-LRP6 affinity (0.5-1µM), which predicts that the efficiency of this handoff may be low. We have now commented on this in the discussion section and mentioned an alternative model in which Tspan12 presents the second Norrin protomer to LRP5/6 for signaling, instead of dissociating (page 9, paragraph 2). However, the handoff efficiency could also be impacted by other factors such as the relative abundance and surface distribution of Tspan12, Fzd4, LRP6 and HSPGs.  

      Areas that would benefit from further experiments, or a discussion, include: 

      -  The authors test a potential composite binding site of Fzd4/Tspan12 heterodimers for norrin using nanodiscs that contain on average about 1 molecule Fzd4 and 1 molecule Tspan12. The Fzd4/Tspan12 heterodimer is co-inserted into the nanodiscs supported by split-GFP tags on Fzd4 and Tspan12. The authors find no major increase in affinity, although they find changes to the Hill slope, reflecting better binding of norrin at low norrin concentrations. In 293F cells overexpressing Fzd4 and Tspan12 (which may result in a different stoichiometry) they find more pronounced effects of norrin binding to Fzd4/Tspan12. This raises the possibility that the formation of a composite binding requires Fzd4/Tspan12 complexes of higher order, for example, 2:2 Fzd4/Tspan12 complexes, where the composite binding site may involve residues of each Fzd4 and Tspan12 molecule in the complex. This could be tested in nanodiscs in which Fzd4 and Tspan12 are inserted at higher concentrations or using Fzd4 and Tspan12 that contain additional tags for oligomerization. 

      It is quite possible that Tspan12 and Fzd4 cluster into complexes with a stoichiometry greater than 1:1 in cells (this is supported by e.g., BRET experiments in (Ke et al., 2013)), and we mention in the discussion that that receptor clustering may be an additional mechanism by which Tspan12 exerts its function (page 10, paragraph 4). We would be quite interested to know the stoichiometry of Fzd4 and Tspan12 complexes in cells at endogenous expression levels, both in the presence and absence of Norrin, and to biochemically characterize these putative larger complexes in the future. We have amended the discussion to mention the caveat that our reconstitution experiments do not test higher-stoichiometry Fzd4/Tspan12 complexes (page 10, last paragraph).

      - While Tspan12 LEL does not bind to Fzd4, the successful reconstitution of GFP from Tspan12 and Fzd4 tagged with split GFP components provides evidence for Fzd4/Tspan12 complex formation. As a negative control, e.g., Fzd5, or Tspan11 with split GFP tags (Fzd5/Tspan12 or Fzd4/Tspan11) would clarify if FZD4/Tspan12 heterodimers are an artefact of the split GFP system. 

      The split-GFP system allows us to co-purify receptors that do not normally co-localize (for example, as we have shown with Fzd4 and LRP6 in the absence of ligand (Bruguera et al., 2022)) so we do not mean to claim that it provides evidence for Fzd4/Tspan12 complex formation. In fact, we were unable to co-purify co-expressed Fzd4 and Tspan12 unless they were tethered with the split GFP system, and separately-purified Fzd4 and Tspan12 did not incorporate into nanodiscs together unless they were tethered by split GFP. Based on these experiments, we expect that the purported Fzd4-Tspan12 interaction that others have found by co-IP or co-localization is easily disrupted by detergent, may require a specific lipid, and/or may not be direct.

      To clarify this point, we have noted in the results section that without the split GFP tags, Tspan12 and Fzd4 did not co-purify or co-reconstitute into nanodiscs, and that co-reconstitution was enabled by the split GFP system (page 6, first full paragraph).   

      - Fzd4/Tspan12 heterodimers stabilized by split GFP may be locked into an unfavorable orientation that does not allow for the formation of a composite binding site of FZD4 and Tspan12, this is another caveat for the interpretation that Fzd4/Tspan12 do not form a composite binding site. This is not discussed. 

      While the split GFP does enforce a Fzd4/Tspan12 dimer, the split GFP is removed by protease cleavage during the final step of the purification process, after the dimer is contained in a nanodisc. This should allow Fzd4 and Tspan12 to freely adopt any pose and to diffuse within the confines of the nanodisc lipid bilayer. However, it has been shown that the phospholipid bilayer in small nanodiscs is not as fluid as the physiological plasma membrane, and although we used the slightly larger belt protein (MSP1E3D1, 13 nm diameter nanodiscs), perhaps the receptors are indeed locked in some unfavorable state for this reason. Additionally, the nanodiscs are planar, so if the formation of a composite binding site requires membrane curvature, this would not be recapitulated in our system. We have cited these caveats in the discussion section (page 10, last paragraph).  

      - Mutations that affect the affinity of norrin/fzd4 are not used to further test if Fzd4 and Tspan12 form a composite binding site. Norrin R41E or Fzd4 M105V were previously reported to reduce norrin/frizzled4 interactions and signaling, and both interaction and signaling were restored by Tspan12 (Lai et al. 2017). Whether a Fzd4/Tspan12 heterodimer has increased affinity for Norrin R41E was not tested. Similarly, affinity of FZD4 M105V vs a Fzd4 M105V/Tspan12 heterodimer were not tested. 

      Since the high affinity of Norrin for both Fzd4 and Tspan12 may have obscured any enhancement of Norrin affinity for Fzd4/Tspan12 compared to either receptor alone, we did consider weakening Fzd-Norrin affinity to sensitize this experiment, inspired by the experiments you mention in (Lai et al., 2017). However, we suspected that the slight increase in Norrin affinity for the Fzd4/Tspan12 dimer compared to Fzd4 alone was driven mainly by increased avidity that enhanced binding of low Norrin concentrations, and this avidity effect would likely confound the interpretation of any experiment monitoring 2:2 complex formation. Additionally, on the basis that soluble Fzd4 extracellular domain and Tspan12 bind Norrin with negative cooperativity (Figures 3D and 3F), we concluded that this composite binding site was unlikely.

      - An important conclusion of the study is that Tspan12 or Lrp6 binding to Norrin is mutually exclusive. This could be corroborated by an experiment in which LRP5/6 is inserted into nanodiscs for BLI binding tests with Norrin, or Tspan12 LEL, or a combination of both. Soluble LRP6 may remove norrin from equilibrium binding/unbinding to Tspan12, therefore presenting LRP6 in a non-soluble form may yield different results. 

      We agree that testing this conclusion in an orthogonal experiment would be a valuable addition to this study. We have now performed a similar experiment to the one you described, but with Norrin immobilized on biosensors, and with LRP6 in detergent competing with Tspan12 LEL for Norrin binding (Figure S12, discussed on page 8, first full paragraph). The results of this experiment show that biosensor-immobilized Norrin will bind LRP6, and that soluble Tspan12 inhibits LRP6 binding in a concentration-dependent manner. The LRP6 construct we use (residues 20-1439) includes the transmembrane domain but has a truncated C terminus, since LRP6 constructs containing the full C terminus tend to aggregate during purification. We chose to immobilize Norrin to make the experiment as interpretable as possible, since immobilizing LRP6 and competing Norrin off with the LEL could result in an increase in signal (from the LEL binding the second available Norrin protomer) as well as a decrease (from Norrin being competed off of the immobilized LRP6). We conducted the experiment in detergent (DDM) instead of nanodiscs to be able to test higher concentrations of LRP6.

      - The authors use LRP6 instead of LRP5 for their experiments. Tspan12 is less effective in increasing the Norrin/Fzd4/Lrp6 signaling amplitude compared to Norrin/Fzd4/Lrp5 signaling, and human genetic evidence (FEVR) implicates LRP5, not LRP6, in Norrin/Frizzled4 signaling. The authors find that Norrin binding to LRP6 and Tspan12 is mutually exclusive, however this may not be the case for Lrp5. 

      This is an important point which we have now addressed in the text (page 8, end of first full paragraph). LRP5 is indeed the receptor implicated in FEVR and expressed in the relevant tissues for Tspan12/Norrin signaling. Unfortunately, LRP5 expresses poorly and we are unable to purify sufficient quantities to perform these experiments. However, LRP5 and LRP6 both transduce Tspan12-enhanced Norrin signaling in TOPFLASH assays (as you mention and as shown by (Zhou and Nathans, 2014)), bind Norrin, and are highly similar (they share 71% sequence identity overall and 73% sequence identity in the extracellular domain), so we expect their Norrin-binding sites to be conserved.

      - The biochemical data are largely not correlated with functional data. The authors suggest that the Norrin R115L FEVR mutation could be due to reduced norrin binding to tspan12, but do not test if Tspan12-mediated enhancement of the norrin signaling amplitude is reduced by the R115L mutation. Similarly, the impressive restoration of binding by charge reversal mutations in site 3 is not corroborated in signaling assays. 

      We agree that testing the impact of Norrin mutations in cell-based signaling assays would be an informative way to further test our model. However, the Norrin mutants we tested generated poor TopFlash signals in all conditions tested. This may be due to general protein instability, weakened affinity for LRP, or weaker interactions with HSPGs. Whatever the cause, the low signal made it challenging to conclusively say whether the Norrin mutations affected Tspan12mediated signaling enhancement.

      When expressed for purification, Tspan12 mutants generally expressed poorly compared to WT Tspan12, so we were concerned that differences in protein stability or trafficking would lead to lower cell-surface levels of mutant Tspan12 relative to WT in TopFlash signaling assays, which would confound interpretation of mutant Tspan’s ability to enhance Norrin signaling.

      Because of these challenges, follow-up experiments to investigate the signaling capabilities of Norrin and Tspan12 mutants were not informative and we have not included them in the revised manuscript.

      Reviewer #3 (Public Review): 

      Brugeuera et al present an impressive series of biochemical experiments that address the question of how Tspan12 acts to promote signaling by Norrin, a highly divergent TGF-beta family member that serves as a ligand for Fzd4 and Lrp5/6 to promote canonical Wnt signaling during CNS (and especially retinal) vascular development. The present study is distinguished from those of the past 15 years by its quantitative precision and its high-quality analyses of concentration dependencies, its use of well-characterized nano-disc-incorporated membrane proteins and various soluble binding partners, and its use of structure prediction (by AlphaFold) to guide experiments. The authors start by measuring the binding affinity of Norrin to Tspan12 in nanodiscs (~10 nM), and they then model this interaction with AlphaFold and test the predicted interface with various charge and size swap mutations. The test suggests that the prediction is approximately correct, but in one region (site 1) the experimental data do not support the model. [As noted by the authors, a failure of swap mutations to support a docking model is open to various interpretations. As AlphFold docking predictions come increasingly into common use, the compendium of mutational tests and their interpretations will become an important object of study.] Next, the authors show that Tspan12 and Fzd4 can simultaneously bind Norrin, with modest negative cooperativity, and that together they enhance Norrin capture by cells expressing both Tspan12 and Fzd4 compared to Fzd4 alone, an effect that is most pronounced at low Norrin concentration. Similarly, at low Norrin concentration (~1 nM), signaling is substantially enhanced by Tspan12. By contrast, the authors show that LRP6 competes with Tspan12 for Norrin binding, implying a hand-off of Norrin from a Tspan12+Fzd4+Norrin complex to a LRP5/6+Fzd4+Norrin complex. Thanks to the authors' careful dose-response analyses, they observed that Norrin-induced signaling and Tspan12 enhancement of signaling both have bell-shaped dose-response curves, with strong inhibition at higher levels of Norrin or Tspan12. The implication is that the signaling system has been built for optimal detection of low concentrations of Norrin (most likely the situation in vivo), and that excess Tspan12 can titrate Norrin at the expense of LRP5/6 binding (i.e., reduction in the formation of the LRP5/6+Fzd4+Norrin signaling complex). In the view of this reviewer, the present work represents a foundational advance in understanding Norrin signaling and the role of Tspan12. It will also serve as an important point of comparison for thinking about signaling complexes in other ligand-receptor systems. 

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors):   

      - In Figure 5F high concentrations of transfected Tspan12 plasmid inhibit signaling, which the authors interpret to support the model that Tspan12/Norrin binding prevents Norrin/LRP6/FZD4 complex formation. Alternatively, the cells do not tolerate the expression of the tetraspanin at high levels, for example, due to misfolding and aggregate formation. To distinguish these possibilities: Do high levels of Tspan12 overexpression also inhibit signaling induced by Wnt3a and appropriate Frizzled receptors, even though Tspan12 has no influence on Wnt/LRP6 binding? 

      We thank the reviewer for suggesting this important control experiment. We have added the Wnt-simulated TOPFLASH values to the figure in 5F for all conditions. In repeating this experiment, we noticed that high levels of transfected Tspan12 may decrease cell viability and therefore have adjusted the range of transfected Tspan12 in the new Figure 5F (discussed on page 8, second full paragraph). Under this new protocol, both Norrin- and Wnt-stimulated signaling were inhibited by the highest amount of transfected Tspan12. However, Norrinstimulated signaling is inhibited by lower amounts of transfected Tspan12 than Wnt-stimulated signaling, and to a greater extent, supporting our proposed model that Tspan12 competes with LRP for Norrin binding.

      - Is Tspan12 with c-terminal rho-tag (the form incorporated into nanodiscs) also used for functional luciferase assays, or was untagged Tspan12 used for the luciferase assays in Fig 4D and 5F? Does the c-terminal tag interfere with Tspan12-mediated enhancement of Norrin signaling? 

      For the luciferase assays included in this manuscript, wildtype, full-length, untagged Tspan12 is used. We have clarified this in our methods section. When we tested the wildtype vs Cterminally rho1D4-tagged version of Tspan12 in TOPFLASH assays, we saw that the enhancement of Norrin signaling by Tspan12-1D4 was weaker than enhancement by untagged Tspan12. This is consistent with the finding reported in Cell Reports (Lai et al., 2017) that a chimeric Tspan12 receptor with its C-terminus replaced with that of Tspan11 was still capable of enhancing Norrin signaling, though to a lesser extent than WT Tspan12. The deficiency of signaling by our rho1D4-tagged Tspan12 could be due to a difference in receptor expression level or trafficking, but in the absence of a reliable antibody against Tspan12, we were unable to assess the expression levels or localization of the untagged Tspan12 to compare it to the rho1D4-tagged version. (For binding experiments, we reasoned that the C-terminal tag should not affect Tspan12’s ability to bind Norrin extracellularly, especially as we found that purified fulllength Tspan12 and Tspan12∆C (residues 1-252) bound Norrin equally well; we have added this comparison to table S1.)  

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments. 

      Based on the Fzd4-Dvl binding experiment, the authors might state explicitly the possibility that Tspan12's relevance is entirely accounted for by extracellular ligand capture. 

      We have stated this possibility explicitly in the discussion section (page 9, last paragraph). 

      Page 4, 3rd paragraph. I suggest "To experimentally test this structural prediction..." rather than "validate". 

      Thank you for this suggestion; we have replaced this wording. 

      This next item is optional, but I hope that the authors will consider it. This manuscript provides an opportunity for the authors to be more expansive in their thinking, and to put their work into the larger context of ligand+receptor+accessory protein interactions. The authors describe the Wnt7a/7b-Gpr124-RECK system and the role of HSPs in Norrin and Wnt signaling, but perhaps they can also comment on non-Wnt ligand-receptor systems where accessory proteins are found. They might add a figure (or supplemental figure) with a schematic showing the roles of HSP and Gpr124-RECK, and some non-Wnt ligand-receptor systems. This would help to make the present work more widely influential.

      Thank you for this suggestion. We have added a figure (Figure 6, discussed on page 10, paragraphs 2 and 3) and expanded our discussion to include other co-receptor systems. We have specifically focused on co-receptors that both capture ligands and interact with their primary receptor(s), thus delivering ligands to their receptors, as we have proposed for Tspan12. Within Wnt signaling, other co-receptor systems with this mechanism are RECK/Gpr124 (for Wnt7a/b) and Glypican-3. We found it interesting that this mechanism is also shared by several growth factor pathways with cystine knot ligands (like Norrin), so we have illustrated and mentioned three of these examples.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

      While we agree that it would be helpful to adopt a more direct method for linking laminar changes observed with electrophysiology to anatomical layers observed in postmortem histology, we do not believe that the approach suggested by the reviewer would be particularly helpful. The approach suggested involves making lesions, which are known to be quite variable in size, asymmetric in shape, and do not have a predictable geometry relative to the location of the electrode tip. In contrast, our electrophysiology measures have identified clear boundaries which precisely match the known widths and relative positions of all the layers of V1, including layer 4A, which is only 50 microns thick, much smaller than the resolution of lesion methods.

      Reviewer #2 (Public Review):

      Summary:

      This paper documents an attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors thus turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity.

      Strengths:

      There is a lot of nice data to look at in this paper that shows interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electrophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Much effort is spent pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Rather than showing examples of what is already accepted, the emphasis should be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal. Ultimately, a more quantitative approach to the question of consistency is needed to assess the value of the methods proposed here.

      We thank the reviewer for suggesting the addition of quantitative metrics to allow more substantive comparisons between various measures within and between penetrations. We have added quantification and describe this in the context of more specific comments made by this reviewer. We have retained descriptions of metrics that are well established because they provide an important validation of our approaches and laminar assignments.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in this paper could be compared to anatomical layering.

      We are not aware of any approach that would provide such information at sufficient resolution. For example, it is well known that electrolytic lesions often do not match to the locations expected from electrophysiological changes observed with single electrodes. As noted above, our observation that the laminar changes in electrophysiology precisely match the known widths and relative positions of all the layers of V1, including layer 4A, provides confidence in our laminar assignments.

      On line 162, the text states that there is a clear lack of consistency across penetrations, but why should there be consistency: how far apart in the cortex were the penetrations? How long were the electrodes allowed to settle before recording, how much damage was done to tissue during insertion? Do you have data taken over time - how consistent is the pattern across several hours, and how long was the time between the collection of the penetrations shown here?

      Answers to most of these questions can be found within the manuscript text. We have added text describing distance between electrode penetrations (at least 1mm, typically far more) and added a figure which shows a map of the penetration locations. The Methods section describes electrode penetration methods to minimize damage and settling times of penetrations. Data are provided regarding changes in recordings over time (see Methods, Drift Correction). The stimuli used to generate the data described are presented within a total of 30 minutes or less, minimizing any changes that might occur due to electrode drift. There is a minimum of 3 hours between different penetrations from the same animal.

      The impact of the paper is lessened because it emphasizes consistency but not in a consistent manner. Some demonstrations of consistency are shown for CSDs, but not quantified. Figure 4A is used to make a point about consistency in cell density, but across animals, whereas the previous text was pointing out inconsistency across penetrations. What if you took a 40 or 60 um column of tissue and computed cell density, then you would be comparing consistency across potentially similar scales. Overall, it is not clear how all of these different metrics compare quantitatively to each other in terms of consistency.

      As noted above, we have now added quantitative comparisons of consistency between different metrics. It is unclear why the reviewer felt that we use Figure 4A to describe consistency. That figure was a photograph from a previous publication simply showing the known differences in neuron density that are used to define layers in anatomical studies. This was intended to introduce the reader to known laminar differences. At any rate, we have been unable to contact the previous publishers of that work to obtain permission to use the figure. So we have removed that figure as it is unnecessary to illustrate the known differences in cell density that are used to define layers. We have kept the citation so that interested readers can refer to the publication.

      In many places, the text makes assertions that A is a consistent indicator of B, but then there appear to be clear counterexamples in the data shown in the figures. There is some sense that the reasoning is relying too much on examples, and not enough on statistical quantities.

      Without reference to specific examples we are not able to address this point.

      Overall

      Overall, this paper makes a solid argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. It is nice to look at the data in this paper and to read the authors' highly educated interpretation and speculation about how useful such measurements will be in general to make layer assignments. It is easy to agree with much of what they say, and to hope that in the future there will be reliable, quantitative methods to make meaningful segmentations of neurons in terms of their differentiated roles in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades now remains unclear.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of single- and multi-units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, and in particular the unit density distribution, are likely to be sensitive to the criteria used for spike sorting, which differ widely among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community. The analysis of detected unit density yields fluctuations across cortical depth which the authors attribute to variations in neural density across layers; however, these patterns seemed particularly variable across penetrations and did not consistently yield peaks at depths that should have high neuronal density, such as layer 2. Therefore, this measure has limited interpretability.

      While we agree that our electrophysiological measure of unit density does not strictly reflect anatomical neuronal density, we would like to remind the reader that we use this measure only to roughly estimate the correspondence between changes in density and likely layer assignments. We rely on other measures (e.g. AP power, AP power changes in response to visual stimuli) that have sharp borders and more clear transitions to assign laminar boundaries. Further, as noted in the reviewer’s list of strengths, the laminar assignments made with these measures are cross validated by differences in response latencies and sensitivity to different types of stimuli that are observed at different electrode depths.

      More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct per-penetration comparison with histologically identified boundaries. Ultimately, the absence of this type of independent confirmation limits the strength of their claim that veridical laminar boundaries can be identified from electrophysiological signals alone.

      As we have noted in response to similar comments from other reviewers, we are not aware of a method that would make this possible with sufficient resolution.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers have indicated that their assessment would potentially be stronger if their advice for quantitative, statistically validated comparisons was followed, for example, to demonstrate variability or consistency of certain measures that are currently only asserted. Also, if available, some histological confirmation would be beneficial. It was requested that the use and modification of the layering from Balaram & Kaas is addressed, as well as dealing with inconsistencies in the scale bars on those figures. There are two figure permission issues that need to be resolved prior to publication: Balaram & Kaas 2014 in Fig 1A, Kelly & Hawken 2017 in Fig. 4A.

      Please see detailed responses to reviewer comments below. We have added new supplemental figures to quantitatively compare variability among metrics. As noted above, the suggested addition of data linking the electrophysiology directly to anatomical observations of laminar borders from the same electrode penetration is not feasible. The figure reused in Figure 1A is from open-access (CC BY) publication (Balaram & Kaas 2014). After reexamining the figure in the original study, we found that the inferred scale bar would give an obviously inaccurate result. So, we decided to remove the scale bar in Figure 1A. We haven’t received any reply from Springer Nature for Figure 4A permission, so we decided to remove the reused figure from our article (Kelly & Hawken 2017).

      Reviewer #1 (Recommendations For The Authors):<br /> Figure 4A has a different scale to Figure 4B-4F. It is better to add dashed lines to indicate the relationship between the cortical layers or overall range from Figure 4A to the corresponding layers in 4B to 4F.

      The reused figure in Figure 4A is removed due to permission issue. See also comments above.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      This paper demonstrates that voltage signals in frequency bands higher than those used for LFP/CSD analysis can be used from high-density electrical contact recording to generate a map of cortical layering in macaque V1 at a higher spatial resolution than previously attained.

      My main concern is that much of this paper seems to show that properties of voltage signals recorded by electrodes change with depth in V1. This of course is well known and has been mapped by many who have advanced a single electrode micron-by-micron through the cortex, listening and recording as they go. Figure 4 shows that spike shapes can give a clear indication of GM to WM borders, and this is certainly true and well known. Figures 5 and 6 show that activity level on electrodes can indicate layers related to LGN input, and this is known. Figure 7 shows that latencies vary with layer, and this is certainly true as we know. A main point seems to be that CSD is highly inconsistent. This is important to know if CSD is simply never going to be a good measure for layering in V1, but it would require quantification and statistics to make a fair comparison.

      We are glad to see that the reviewer understands that changes in electrical signals across layers are well known and are expected to have particular traits that change across layers. We do not claim that have discovered anything that is unexpected or unknown. Instead, we introduce quantitative measures that are sensitive to these known differences (historically, often just heard with an audio monitor e.g. “LGN axon hash”). While the primary aim of this paper is not to show that Neuropixels probes can record some voltage signal properties that cannot be recorded with a single electrode before, we would like to point out that multi-electrode arrays have a very different sampling bias and also allow comparisons of simultaneous recordings across contacts with known fixed distances between them. For example our measure of “unit spread” could not be estimated with a single electrode.

      We’ve added Figure S3 to show quantitative comparison of variation between CSD and AP metrics. These figures add support to our prior, more anecdotal descriptions showing that CSDs are inconsistent and lack the resolution needed to identify thin layers.

      Some things are not explained very clearly. Like achromatic regions, and eye dominance - these are not quantified, and we don't know if they are mutually consistent - are achromatic/chromatic the same when tested through separate eyes? How consistent are these basic definitions? How definitive are they?

      The quantitative definitions of achromatic region/COFD and eye dominance column can be found in our previous paper (Li et al., 2022) cited in this article. The main theme of this study is to develop a strategy for accurately identifying layers, the more detailed functional analysis will be described in future publications.

      Specific comments

      The abstract refers to CSD analysis and CSD signals. Can you be more precise - do you aim to say that LFP signals in certain frequency bands are already known to lack spatial localization, or are you claiming to be showing that LFP signals lack spatial resolution? A major point of the results appears to be lack of consistency of CSD, but I do not see that in the Abstract. The first sentence in the abstract appears to be questionable based on the results shown here for V1.

      We have updated the Abstract to minimize confusion and misunderstanding.

      Scale bar on Fig 1A implies that layers 2-5 are nearly 3 mm thick. Can you explain this thickness? Other figures here suggest layers 1-6 is less than 2 mm thick. Note, in a paper by the same authors (Balaram et al) the scale bar (100 um, Figure 4) on similar macaque tissue suggests that the cortex is much thinner than this. Perhaps neither is correct, but you should attempt to determine an approximately accurate scale. The text defines granular as Layer 4, but the scale bar in A implies layer 4 is 1 mm thick, but this does not match the ~0.5 mm thickness consistent with Figure 1E, F. The text states that L4A is less then 100 um thick, but the markings and scale bar in Figure 1A suggests that it could be more than 100 um thick.

      We thank the reviewer for pointing out that there are clearly errors in the scale bars used in these previously published figures from another group. In the original figure 1(Balaram & Kaas 2014), histological slices were all scaled to one of the samples (Chimpanzee) without scale bar. After reexamining the scale bar we derived based on figure 2 of the original study, we found the same problem. Since relative widths of layers are more important than absolute widths in our study, we decided to remove the scale bar that we had derived and added to the Figure 1A.

      Line 157. Fix "The most commonly visual stimulus"

      Text has been changed

      Line 161. Fix "through dominate eye"

      Text has been changed

      Line 166. Please specify if the methods established and validated below are histological, or tell something about their nature here.

      The Abstract and Introduction already described the nature of our methods

      Line 184. Text is mixing 'dominant' and 'dominate', the former is better.

      Text has been changed accordingly

      Line 188. Can you clarify "beyond the time before a new stimulus transition". Are you generally referring to the fact that neuronal responses outlast the time between changes in the stimulus?

      That is correct. We are referring to the fact that neuronal responses outlast the time between changes in the stimulus. We have edited the text for clarity.

      Line 196. Fix "dominate eye" in two places.

      Text has been changed

      Line 196. The text seems to imply it is striking to find different response patterns for the two eyes, but given the OD columns, why should this be surprising?

      Since we didn’t find systematic comparison for CSD depth profiles of dominant/non-dominant eyes, or black/white in the past studies, we just describe what we saw in our data. The rational for testing each eye is that it is known that LGN projections from two eyes remain separated in direct input layer of V1, so comparing CSDs from two eyes could potentially help identifying input layers, such as L4C. Here we provide evidence showing that CSD profiles from two eyes deviate from naive expectations. For example, CSDs from black stimulus show less variation between two eyes, whereas CSDs from white stimulus could range from similar profile to drastically different ones across eyes.

      Line 198. Text like, "The most consistent..." is stating overall conclusions drawn by the authors before pointing the reader specifically to the evidence or the quantification that supports the statement.

      We’ve adjusted the text pointing to Figure S2, where depth profiles of all penetrations are visualized, and a newly added Figure S3, where the coefficients of variation for several metric profiles were shown.

      Line 200. "white stimulus is more variable" - the text does not tell us where/how this is supported with quantitative analysis/statistics.

      We’ve adjusted the text pointing to Figure S2, S3

      The metric in 4B is not explained, the text mentions the plot but the reader is unable to make any judgement without knowledge of the method, nor any estimate of error bars.

      The figure is first mentioned in section: Unit Density, and text in this section already described the definition of neuron density and unit density.  We’ve also modified the text pointing to the method section for details.

      Line 236. The text states the peak corresponds to L4C, but does not explain how the layer lines were determined.

      As described early in the CSD section, all layer boundaries are determined following the guide which layouts the strategy for how to draw borders by combining all metrics.

      At Line 296 the spike metrics section ends without providing a clear quantification of how useful the metrics will be. It is clear that the GM to WM boundary can be identified, but that can be found with single electrodes as well, as neurophysiologists get to see/hear the change in waveform as the electrode is advanced in even finer spatial increments than the 20 um spacing of the contacts here.

      The aim of this study is to develop an approach for accurately delineating layers simultaneously. The metrics we explored are considered estimation of well-known properties, so they can provide support for the correctness we hope to achieve. Here we first demonstrate the usefulness and later show the average across penetrations (Figure 9C-F). We are less concerned in quantification of how different factors affect precision and consistency of these metrics or how useful a single metric is, but rather, as described in the guide section, whether we can delineate all layers given all metrics.

      Line 302-306. Why this statement is made here is unclear, it interrupts the flow for a reason that perhaps will be explained later.

      This statement notes the insensitivity of this measure to temporal differences, introducing the value of incorporating a measure of how AP powers changes over time in the next section of the manuscript.

      Line 311. What is the reason to speculate about no canceling because of temporal overlap? Are you assuming a very sparse multi unit firing rate such that collisions do not happen?

      Here we describe a simple theoretical model in which spike waveforms only add without cancelling, then the power would be proportional to the number of spikes. In reality, spike waveform sometimes cancels causing the theoretical relationship to deteriorate to some degree.

      Lines 327-346. There is a considerable amount of speculation and arguing based on particular examples and there is a lack of quantification. Neuron density is mentioned, but not firing rate. would responses from fewer neurons with higher firing rate not be similar to more neurons with lower firing rates?

      According to the theoretical model we described, power is proportional to numbers of spikes which then depend on both neuron density and firing rate. So fewer neurons with higher firing rate would generate similar power to more neurons with lower firing rate. We’ve expanded the explanation of the model and added Figure S4 about the depth profile of firing rate. Text has also been adjusted pointing to the Figure S2, S3 about quantitively comparisons of variability.

      Line 348 states there is a precise link between properties and cortical layers, but the manuscript has not, up to this point, shown how that link was determined or quantified it.

      Through our generative model of power and the similarity between depth profile of firing rate and depth profile of neuron density (Figure S4), depth profile of power can be used to approximate depth profile of neuron density which is known to be closely correlated to cortical layering.

      Line 350. What is meant by "stochastic variability"?

      The text essentially says distances from electrode contact to nearby cell bodies were random, so closer cells have higher spike amplitudes and in turn result in higher power on a channel.

      The figures showing the two metrics, Pf and Cf, should be shown for the same data sets. The markings indicate that Fig 5 and Fig 6 show results from non-overlapping data sets. This does not build confidence about the results in the paper.

      Here we use typical profiles to demonstrate the characteristics of the power spectrum/coherence spectrum because of the variation across penetrations. We show later, in the guide section, all metrics for one penetration (another two cases in supplemental figures) and how to combine all metrics to derive layer delineations.

      Line 375 the statement is somewhat vague, "there are nevertheless sometimes cases where they can resolve uncertainties," can you please provide some quantitative support?

      We provided 3 examples in Figure 6, and more examples are shown in Figure 8, Figure S5, S6.

      Line 379. I believe the change you want to describe here is a change associated with a transition in the visual stimulus. It would be good to clarify this in the first several sentences here. Baseline can mean different things. I got the impression that your stimuli flip between states at a rate fast enough that signals do not really have time to return to a baseline.

      We rephrased the sentence to describe the metric more precisely. A pair of uniform colors flipping in 1.5 second intervals is usually long enough for spiking activities to decay to a saturated level.

      This section (379 - 398) continues a qualitative show-and-tell feel. There appears to be a lot of variability across the examples in Figure 7. How could you try to quantify this variability versus the variability in LFP? And, in this section overall, the text and figure legend don't really describe what the baseline is.

      Text adjustments are made to briefly describe the baseline window and point to the Method section where definitions are described in detail. We’ve added Figure S3 together with Figure S2 to address the variability across penetrations, stimuli, and metrics.

      Line 405 - 415. The discussion here does not consider that layers may not have well defined boundaries, the text gives the impression that there is some ultimate ground truth to which the metrics are being compared, but that may not be accurate.

      Except for a few layers/sublayers, such as L2, L3A, L3B, most layer boundaries of neocortex are well defined (Figure 1A) and histological staining of neurons/density and correlated changes in chemical content show very sharp transitions. The best of these staining methods is cytochrome oxidase, which shows sharp borders at the top and bottom of layer 4A, top and bottom of layer 4C, and the layer 5/6 border. There is also a sharp transition in neuronal cell body size and density at the top and bottom of layer 4Cb. The definition and delineation of all possible layers are constantly being refined, especially by accumulated knowledge of genetic markers of different cell types and connection patterns. In our study, we develop metrics to estimate well known anatomical and functional properties of different layers. We have also discussed layer boundaries that have been ambiguous to date and explained the reason and criteria to resolve them.

      Line 423. The text references Figure 1A in stating that relative thickness and position is crucial, but FIgure 1A does not provide that information and does not explain how it might be determined, or how much of a consensus there is. Also, the text does not consider that the electrode may go through the cortex at oblique angles, and not the same angle in each layer, and the relative thickness may not be a dependable reference.

      There are numerous studies that describe criteria to delineate cortical layers, the referenced article (Balaram & Kaas 2014) is used here as an example. We are not aware of any publication that has systematically compared the relative thickness of layers across the V1 surface of a given animal or across animals. Nevertheless, it is clear from the literature that there is considerable similarity across animals. Accordingly, we cannot know what the source of variability in overall cortical thickness in our samples is, but we do see considerable consistency in the relative thickness of the layers we infer from our measures. We illustrate the differences that we see across penetrations and consider likely causes, such as the extent to which the coverslip pressing down on the cortex might differentially compress the cortex at different locations within the chamber.

      The angle deviation of probe from surface will not change the relative thickness of layers, and the rigid linear probe is unlikely to bend in the cortex.

      Line 433. The term "Coherence" is used, clarify is this is you Cf from Figure 6. The text states, "marked decrease at the bottom of layer 6". Please clarify this, I do not see that in Figure 6.

      Text has been adjusted.

      In Figure 6, the locations of the lines between L1 and 2 do not seem to be consistent with respect to the subtle changes in light blue shading, across all three examples, yet the text on line 436 states that there is a clear transition.

      We feel that the language used accurately reflects what is shown in the figure. While the transition is not sharp, it is clear that there is a transition. This transition is not used to define this laminar border. We have edited the text to clarify that the L1/2 border is better defined based on the change in AP power which shows a sharp transition (Figure 7). 

      The text states that the boundary is also "always clear" from metrics... and sites Figure 5, but I do not see that this boundary is clear for all three examples in Figure 5.

      Text has been adjusted.

      Line 438. The text states that "it is not unusual for unit density to fall to zero below the L1/2 border (Figure 8E)", but surprisingly, the line in Figure 8 E does not even cover the indicated boundary between L1 and L2.

      At this point, the number of statements in the text that do not clearly and precisely correlate to the data in the figures is worrisome, and I think you could lose the confidence of readers at this point.

      We do not see any inconstancy between what is stated in our text and what is noted by the reviewer. The termination of the blue line corresponds to the location where no units are detected. This is the location where “unit density falls to zero”.  In this example, no units resolved through spike sorting until ~100mm beneath the L1/L2 boundary, which is exactly zero unity density (Figure 8E). That there are electrical signals in this region is clear from the AP power change (Figure 8C) which also shows the location of the L1/L2 border.

      Line 448. Text states that the 6A/B border is defined by a sharp boundary in AP power, but Figure 8A "AP power spectrum" does not show a sharp change at the A/B line. There is a peak in this metric in the middle to upper middle of 6A, but nothing so sharp to define a boundary between distinct layers, at least for penetration A2.

      Text has been adjusted.

      In Figure 8, the layer labels are not clear, whereas they are reasonably clear in the other figures.

      This is a technical problem regarding vector graphics that were not properly converted in PDF generation. We will upload each high-quality vector graphics when we finalize the version of record.

      The text emphasizes differences in L4B and L4C with respect to average power and coherence, but the transition seems a bit gradual from layer 3B to 4C in some examples in Figure 6. And in Figure 5, A3, there doesn't appear to be any particular transition along the line between 4B and 4C.

      In this guide section, we pointed out early that some metrics are good for some boundaries and variation exists between penetrations. We’ve expanded text emphasizing the importance of timing differences in DP/P for differentiating sublayers in L4. Lastly, in case of several unresolvable boundaries given all the metrics, the prior knowledge of relative thickness should be used.

      Line 466 provides prescriptions in absolute linear distances, but this is unwise given that cortex may be crossed at oblique angles by electrodes, particularly for parts of V1 that are not on the surface of the brain. Other parts of the text have emphasized relative measurements.

      Text has been changed using relative measurements.

      Line 507. The text says 9C and 4A are a good match, but the match does not look that good (4A has substantial dips at 0.5 and 0.75, and substantial peaks), and there is no quantification of fit. The error bars on 9C do not help show the variability across penetrations, they appear to be SEM, which shows that error bars get smaller as you average more data. It would seem more important to understand what is the variance in the density from one penetration to the next compared to the variance in density across layers.

      We have replaced “good match” with “roughly corresponds”. We note that we do not use unit density as a metric for identification of laminar borders and instead show that the expected locations of layers with higher neuronal density correspond to the locations where there are similar changes in unit density. It should be noted that Figure 9C is an average across many penetrations so should not be expected to show transitions that are as sharp in individual penetrations. Because of the figure permission issue, we have removed Figure 4A, and changed the text accordingly.

      Figure 9C-F show a lot of variability in the individual curves (dim gray lines) compared to the overall average. Does this show that these metrics are not reliable indicators at the level of single penetration, but show some trends across larger averages?

      In the beginning of the guide, we emphasized that all metrics should be combined for individual penetration, because some metrics are only reliable for delineating certain layer boundaries and the quality of data for the various measures varies between penetrations. The penetration average serves the same purpose explained in the previous question as an indicator that our layer delineation was not far off.

      The discussion mentions improvements in layer identification made here. Did this work check the assignments for these penetration against assignments made based on some form of ground truth? Previous methods would advance electrodes steadily, and make lesions, and carry out histology. Is there any way to tell how this method would compare to that?

      Even electrolytic lesions do not necessarily reveal ground truth and can be quite misleading. And their resolution is limited by lesion size. Lesions are typically variable in size, asymmetric and have variable shape and position relative to the location of the electrode tip, likely affected by the quality and location of electrical grounding and variations in current flow due to locations of blood vessels. A review of the published literature with electrode lesions shows that electrophysiological transitions are likely a far more accurate indicator of recording locations than post-mortem histology from electrolytic lesions. It is extremely rare for the locations of lesions to be precisely aligned to expected laminar transitions. See for example Chatterjee et al (Nature 2004). Also see several manuscripts from the Shapley lab. The lone rare exception of which we are aware is Blasdel and Fitzpatrick1984 in which consistently small and round lesions were produced and even these would be too large (~100 microns) to accurately identify layers if it were not for the fact that the electrode penetrations were very long and tangential to the cortical layers. 

      Reviewer #3 (Recommendations For The Authors):

      - The authors say (lines 360-362) that "Assuming spikes of a neuron spread to at least two adjacent recording channels, then the coherence between the two channels would be directly proportional to number of spikes, independent of spike amplitude." Has this been demonstrated? Very large amplitude spikes should show up on more channels than small amplitude spikes. Do waveform amplitudes and unit densities from the spike waveform analyses show consistent relationships to the power and/or coherence distributions over depth across penetrations?

      This part of the manuscript is providing a theoretical rational for what might be expected to affect the measures that we have derived. That is why we begin by stating that we are making an assumption. The answers to the reviewer’s questions are not known and have not been demonstrated. By beginning with this theoretical preface, we can point to cases where the data match these expectations as well as other cases where the data differ from the theoretical expectations.

      Coherence, by definition, is a normalized metric that is insensitive to amplitude. Spike amplitude mainly depends on how close the signal source is to electrode, and spike spread mainly depends on cell body size and shape given the same distance to electrode. Therefore, a very large spike amplitude could stem from a very close small cell to electrode, but would result in a small spike spread, especially axonal spikes (Figure 4B, red spike). Spike amplitudes on average are higher in L4C which matches the expectation that higher cell density would result, on average, closer cell body to electrode (Figure S4A). Nonetheless, the high-density small cell bodies in L4C result in a small spike spread (Figure 9D).

      - I suggest clarifying what is defined as the baseline window for the ΔP/P measure - is it the entire 10-150 ms response window used for the power spectrum analysis?

      Text adjustments are made in the Methods where the time windows are defined at the beginning of the CSD section. Only temporal change metrics (ΔCSD and ΔP/P) use the baseline window ([-40, 10]ms). The other two spectrum metrics (Power and Coherence) use the response window ([10, 150]ms).

      - Firing rate differs by cell type and, on average, differs by layer in V1. Many layer 2/3 neurons, for example, have low maximum firing rates when driven with optimized achromatic grating stimuli. To the extent that the generative models explaining the sources of power and coherence signals rely on the assumption that firing rates are matched across cortical depth, these models may be inaccurate. This assumption is declared only subtly, and late in the paper, but it is relevant to earlier claims.

      Text adjustments are made to explicitly describe the possibility that uneven depth profile of firing rate could counteract the depth profile of neuron density, resulting distorted or even a flat depth profile of power/coherence that deviates far from the depth profile of neuron density. In a newly added Figure S4, we first show the average firing rate profile during a set of stimuli (uniform color, static/drifting, achromatic/chromatic gratings), then specifically the PSTHs of the same stimuli shown in this study. It can be seen that layers receiving direct LGN inputs tend to fire at a higher rate (L4C, L6A). Firing rates in the PSTHs either roughly match across layers or are much higher in the densely packed layers. Therefore, the depth profile of firing rate contributes to rather than counteracting that of neuron density, enhancing the utility of the power/coherence profile for identification of correct layer boundaries.

      - Given the acute preparation used for recordings, I wonder whether tissue is available for histological evaluation. Although the layers identified are generally appropriate in relative size, it would be particularly compelling if the authors could demonstrate that the fraction of the cortical thickness occupied by each layer corresponded to the proportion occupied by that layer along the probe trajectory in histological sections. This would lend strength to the claim that these analyses can be used to identify layers in the absence of histology. Furthermore, variations in apparent cortical thickness could arise from different degrees of deviation from surface normal approach angles, which might be apparent by evaluation of histological material. I would add that variation in thickness on the scale shown in Fig. S4 is more likely to have an explanation of this kind.

      To serve other purposes unrelated to this study (identification of CO blobs), we cut the postmortem tissue in horizontal slices, so the histological comparison suggested cannot be made. The cortical thickness measured in this study had been affected not only by the angle deviation from the surface normal but also the swelling and compression of cortex. Nevertheless, evaluating the absolute thickness of cortex is not the main purpose of this study.

      Text and figure suggestions:

      - Fig 1A has been modified from Balaram & Kaas (2014) to revert to the Brodmann nomenclature scheme they argue against using in that paper; I wonder if they would object to this modification without explanation. Related, in the main text the authors initially refer to layers using Brodmann's labels with a secondary scheme (Hassler's) in parentheses and later drop the parenthetical labels; these conventions are not described or explained. Readers less familiar with the multiple nomenclature schemes for monkey V1 layers might be confused by the multiple labels without context, and could benefit from a brief description of the convention the authors have adopted.

      Throughout our article, we only used Brodmann’s naming convention because it has historically been adopted for old world monkey which we use in our study, whereas Hassler’s naming convention is more commonly used for new world monkey. Different naming conventions do not change our result, and it is out of scope for our study to discuss which nomenclature is more appropriate.

      - References to "dominate eye" throughout the text and figure legends should be replaced with "dominant eye."

      It has been changed throughout the article.

      - It is a bit odd to duplicate the same example in Fig. 2C and 2E. Perhaps a unique example would be a better use of the space.

      Here we first demonstrate the filtering effect, then compare profiles across different penetrations. The same example bridges the transition allowing side-by-side comparison.

      - The legend for Fig. 3 might be clearer if it simply listed the stimulus transitions for each column left to right, i.e. "black to white (non-dominant eye), white to black (non-dominant eye), black to white (dominant eye), ..."

      We feel that the icons are helpful. Here we want to show the stimulus colors directly to readers.

      - The misalignment between Fig. 4A vs. 4B-F, combined with the very small font size of the layer labels in Fig. 4B-F, make the visual comparison difficult. In Figs. 7 and 8, layer labels (and most labels in general) are much too small and/or low resolution to read easily. Overall, I would recommend increasing font size of labels in figures throughout the paper.

      The reused figure in Figure 4A is removed due to permission issue. Font sizes are adjusted.

      - Line 591 "using of high-density probes" should be "using high-density probes"

      Text has been changed accordingly

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      “…However, the findings are reliant on high concentrations of inhibitor drugs, and mechanistic details about the molecular interaction and respective functions of ABHD2 and mPRb are incomplete.”

      As discussed below in the response to Reviewers the drug concentrations used span the full dose response of the active range of each drug. In cases where the drug concentrations required to block oocyte maturation where significantly higher than those reported in the literature, we considered those drugs ineffective. In terms of the molecular details of the mechanistic interaction between mPRb and ABHD2, we now provide additional data confirming their molecular interaction to produce PLA2 activity where each protein alone is insufficient. Although these new studies provide more mechanistic insights, there remains details of the ABHD2-mPR interactions that would need to be addressed in future studies which are beyond the scope of the current already extensive study.   

      Public Reviews:

      Reviewer 1

      (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.

      The co-IP experiments presented in Figure 5E argue that the two receptors are constitutively associated at rest before exposure to P4; but at low levels since addition of P4 increases the association between mPRβ and ABHD2 by ~2 folds. Importantly, we know from previous work (Nader et al., 2020) and from imaging experiments in this study that mPR recycles in immature oocytes between the PM and the endosomal compartment. It is not clear at this point within which subcellular compartment the basal association of mPR and ABHD2 occurs. We have tried to elucidate this point but have not been able to generate a functional tagged ABHD2. We generated GFP-tagged ABHD2 at both the N- and C-terminus but these constructs where not functional in terms of their ability to rescue ABHD2 knockdown. This prevented us from testing the association dynamics between ABHD2 and mPR.   

      Regarding whether ABHD2 in the oocyte directly binds P4 or not, we had in the initial submission no data directly supporting this rather we based the cartoon in Fig. 6J on the findings from Miller et al. (Science 2016) who showed that ABHD2 in sperm binds biotinylated P4. With the use of a new expression system to produce ABHD2 in vitro (please see below) we were able to try the experiment suggested by the Reviewer. In vitro expressed ABHD2 was incubated with biotinylated P4, and binding tested on a streptavidin column. Under these conditions we could not detect any specific binding of P4 to ABHD2, however, these experiments remain somewhat preliminary and would require validation using additional approaches to conclusively test whether Xenopus ABHD2 binds P4 or not. The discrepancy with the Miller et al. findings could be species specific as they tested mammalian ABHD2.  

      (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2B and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a.

      We believe the Reviewer meant to indicate discrepancies between Fig. 2E (not 2B) and Supp. Fig. 2C. Indeed, the Reviewer is correct, and this is because Fig. 2E shows pooled normalized data on a per PG species and frog, whereas Supp. Fig. 2E shows and example of absolute raw levels from a single frog to illustrate the relative basal abundance of the different PG species. We had failed to clarify this in the Supp. Fig. 2E figure legend, which we have now added in the revised manuscript. So, the discrepancies are due to variation between different donor animals which is highlighted in Supp. Fig. 2A. Furthermore, to minimize confusion, in the revised manuscript we revised Supp. Fig. 2C to show only PG levels at rest, to illustrate basal levels of the different PG species relative to each other, which is the goal of this supplemental figure. 

      (3) Although they propose PGs, LPA, and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.

      We agree conceptually with the Reviewer that identifying the details of the signaling of the different GPCRs involved in oocyte maturation would be interesting. However, our lipidomic data argue that the activation of a PLA2 early in the maturation process in response to P4 leads to the production of multiple lipid messengers that would activate GPCRs and branch out the signaling pathway to activate various pathways required for the proper and timely progression of oocyte maturation. Preparing the egg for fertilization is complex; so, it is not surprising that a variety of pathways are activated simultaneously to properly initiate both cytoplasmic and nuclear maturation to transition the egg from its meiotic arrest state to be ready to support the rapid growth during early embryogenesis. We focus on the S1P signaling pathway specifically because, as pointed out by the Reviewer, we could not detect an increase in S1P even though our metabolomic data collectively argued for an increase. Our results on the S1P pathway -as well as a plethora of other studies historically in the literature that we allude to in the manuscript- argue that these different GPCRs support and regulate oocyte maturation, but they are not essential for the early maturation signaling pathway. For example, for S1P, as shown in Figure 4, the delay/inhibition of oocyte maturation due to S1PR3 knockdown can be reversed at high levels of P4, which presumably leads to higher levels of other lipid mediators that would bypass the need for signaling through S1PR3. This is reminiscent of the kinase cascade driving oocyte maturation where there is significant redundancy and feedback regulation. Therefore, analyzing each receptor subtype that may regulate the different PG species, LPA, and S1P would be a tedious and time-consuming undertaking that goes beyond the scope of the current manuscript. More importantly based on the above arguments, we suggest that findings from such an analysis, similar to the conclusions from the S1PR3 studies (Fig. 4), would show a modulatory role on oocyte maturation rather than a core requirement for the maturation process as observed with mPR and ABHD2. Thus they would provide relatively little insights into the core signaling pathway driving P4-mediated oocyte maturation.

      Reviewer 2:

      (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype since they both should sequester the AS oligo? Maybe I'm missing something here.

      For the ABHD2 rescue experiment, the ABHD2 constructs (S or L) were expressed 48 hrs before the antisense was injected. The experiment was conducted in this way to avoid the potential confounding issue of both constructs sequestering the antisense. The assumption is that the injected RNA after protein expression would be degraded thus allowing the injected antisense to target endogenous ABHD2. The idea is to confirm that ABHD2.S expression alone is sufficient to rescue the antisense knockdown as confirmed experimentally.

      However, to further confirm the rescue, we performed the experiment in a different chronological order, where we started with injecting the antisense to knock down endogenous ABHD2 and this was followed 24 hrs later by expressing wild type ABHD2.S. As shown in Author response image 1 this also rescues the knockdown.

      Author response image 1.

      ABHD2 knockdown and rescue. Oocytes were injected with control antisense (Ctrl AS) or specific ABHD2 antisense (AS) oligonucleotides and incubated at 18 oC for 24 hours. Oocytes were then injected with mRNA to overexpress ABHD.S for 48 hours and then treated with P4 overnight. The histogram shows % GVBD in naïve, oocytes injected with control or ABHD2 antisense with or without mRNA to overexpress ABHD2.S.

      In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.

      The dose response of ABHD2 protein expression in correlation with rescue of the ABHD2 knockdown is shown indirectly in Figure 1I and 1J. In experiments ABHD2 knockdown was rescued using either the WT protein or two mutants (H120A and N125A). All three constructs rescued ABHD2 KD with equal efficiency (Fig. 1I), eventhough their expression levels varied (Fig. 1J). The WT protein was expressed at significantly higher levels than both mutants, and N125A was expressed at higher levels than H120A (Fig. 1J), note the similar tubulin loading control. Crude estimation of the WBs argues for the WT protein expression being ~3x that of H120A and ~2x that of N125A, yet all three have similar rescue of the ABHD2 knockdown (Fig. 1I). This argues that low levels of ABHD2 expression is sufficient to rescue the knockdown, consistent with the catalytic enzymatic nature of the ABHD2 PLA2 activity.

      Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.

      The n reflects the number of independent female frogs. We have added this information to the figure legends. For each donor frog at each time point 10-30 oocytes were used.

      (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.

      [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. The 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more informative bar graphs or scatter plots.]

      We have revised the lipidomics data as requested by the Reviewer. The Reviewer asked that we show the data as a time course with each individual frog as in Fig. 2E. This turns out to be confusing and not a good way to present the data (please see Author response image 2).

      Author response image 2.

      Metabolite levels from 5 replicates of 10 oocytes each at each time point were measured and averaged per frog and per time point. Fold change was measured as the ratio at the 5- and 30-min time points relative to untreated oocytes (T0). FCs that are not statistically significant are shown as faded. Oocytes with mPR knockdown (KD) are boxed in green and ABHD2-KD in purple.

      We therefore revised the metabolomics data as follow to improve clarity. The changes in the glycerophospholipids and sphingolipids determined on the Metabolon CLP platform (specific for lipids) are now shown as single metabolites clustered at the levels of species and pathways and arranged for the 5- and 30-min time points sequentially on the same heatmap as requested (Fig. 2B). This allows for a quick visual overview of the data that clearly shows the decrease in the lipid species following P4 treatment in the control oocytes and not in the mPR-KD or ABHD2-KD cells (Fig. 2B). The individual species are listed in Supplemental Tables 1 and 2. We also revised the Supplemental Tables to include the values for the non-significant changes, which were omitted from the previous submission.

      We revised the metabolomics data from the HD4 platform in a similar fashion but because the lipid data were complimentary and less extensive than those from the CLP platform, we moved that heatmap to Supplemental Fig. 2B.

      For the single oocyte metabolomics, we now show the data as the correlation between FC and p value, which clearly shows the upregulated (including LPA) and downregulated metabolites at T30 relative to T0 (Fig. 2C). The raw data is now shown in a new Supplemental Table 7.  

      (3) The reticulocyte lysate co-expression data are quite important and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here - I'm not a protein expression expert - but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?

      We thank the Reviewer for this insightful comment. We agree that this is a critical result that would benefit from cross validation, especially given the low level of PLA2 activity detected in the reticulocyte lysate expression system. We have therefore expanded these studies using another in vitro expression system with microsomal membranes based on tobacco extracts (ALiCE®Cell-Free Protein Synthesis System, Sigma Aldrich) to enhance production and stability of the expressed receptors as suggested by the Reviewer. We further prepared virus-like particles (VLPs) from cells expressing each receptor individually or both receptors together. We however could not detect any PLA2 activity from the VLPs. We thus focused on the coupled in vitro transcription/translation tobacco extracts that allow the expression of difficult-to-produce membrane proteins in microsomes. This kit targets membrane protein directly to microsomes using a microsome targeting melittin signal peptide. This system took significant time and effort to troubleshoot and adapt to mPR and ABHD2 expression. We were however ultimately able to produce significantly higher amounts of both ABHD2 and mPRb, which were readily detected by WBs (Supplemental Fig. 4I). In contrast, we could not reliably detect mPR or ABHD2 using WBs from reticulocyte lysates given the limited amounts produced.

      Similarly to our previous findings with proteins produced in reticulocytes, expression of ABHD2 or mPRβ alone was not associated with an increase in PLA2 activity over a two-hour incubation period (Fig. 5C). It is worth noting here that the tobacco lysates had high endogenous PLA2 activity. However, co-expression of both mPRb and ABHD2 produced robust PLA2 activity that was significantly higher than that detected in reticulocyte lysate system (Fig. 5C). Surprisingly, however this PLA2 activity was P4 independent as it was observed when both receptors are co-expressed in the absence of P4.

      These results validate our earlier conclusion that PLA2 activity requires both mPR and ABHD2, so their interaction in needed for enzymatic activity. It is interesting however that in the tobacco expression system this mPR-ABHD2 PLA2 activity becomes for the most part P4 independent. As the tobacco expression system forces both ABHD2 and mPR into microsomes using a signal sequence, the two receptors are enriched in the same vesicular compartment. As they can interact independently of P4 as shown in the co-IP experiments in immature oocytes (Fig. 5D), their forced co-expression in the same microsomal compartment could lead to their association and thus PLA2 activity. This is an attractive possibility that fits the current data, but would need independent validation.

      Reviewer 3:

      There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.

      For the inhibitors used we performed a full dose response to define the active concentrations. So, inhibitors were not used at one high dose. We then compared the EC50 for each active inhibitor to the reported EC50 in the literature (Table 1). The inhibitors were deemed effective only if they inhibited oocyte maturation within the range reported in the literature. This despite the fact that frog oocytes are notorious in requiring higher concentrations of drug given their high lipophilic yolk content, which acts as a sponge for drugs. So our criteria for an effective inhibitor are rather stringent.  

      Based on these criteria, only 3 inhibitors were ‘effective’ in inhibiting oocyte maturation: Ibuprofen, ACA and MP-A08 with relative IC50s to those reported in the literature of 0.7, 1.1, and 1.6 respectively. Ibuprofen targets Cox enzymes, which produce prostaglandins. We independently confirmed an increase in PGs in response to P4 in oocytes thus validating the drug inhibitory effect. ACA blocks PLA2 and inhibits maturation, a role supported by the metabolomics analyses that shows decrease in the PE/PE/LPE/LPC species; and by the ABHD2-mPR PLA2 activity following in vitro expression. Finally, MP-A08 blocks sphingosine kinase activity, which role is supported by the metabolomics showing a decrease in sphingosine levels in response to P4; and our functional studies validating a role for the S1P receptor 3 in oocyte maturation.     

      As pointed out by the Reviewer, other inhibitors did block maturation at very high concentration, but we do not consider these as effective and have not implicated the blocked enzymes in the early steps of oocyte maturation. To clarify this point, we edited the summary panel (now Fig. 2D) to simplify it and highlight the inhibitors with an effect in the reported range in red and those that don’t inhibit based on the above criteria in grey. Those with intermediate effects are shown in pink. We hope these edits clarify the inhibitors studies.

      Recommendations For the Authors

      Reviewer 2:

      (1) Introduction, para 1. Please change "mPRs mediated" to "mPR-mediated".

      Done

      (2) Introduction, para 2. Please change "cyclin b" to "cyclin B".

      Done

      (3) Introduction, para 2. Please change "that serves" to "which serves".

      Done

      (4) Introduction, para 4. I know that the authors have published evidence that "a global decrease in cAMP levels is not detectable" (2016), but old work from Maller and Krebs (JBC 1979) did see an early, transient decrease after P4 treatment, and subsequent work from Maller said that there was both a decrease in adenylyl cyclase activity and an increase in cAMP activity. Perhaps it would be better to say something like "early work showed a transitory drop in cAMP activity within 1 min of P4 treatment (Maller), although later studies failed to detect this drop and showed that P4-dependent maturation proceeds even when cAMP is high (25)".

      We agree and thank the Reviewer for this recommendation. The text was revised accordingly.

      (5) Results, para 1. Based on the results in Fig 1B, one should probably not assert that ABHD2 is expressed "at levels similar to those of mPRβ in the oocyte"-with different mRNAs and different PCR primers, it's hard to say whether they are similar or not. The RNAseq data from Xenbase in Supp Fig 1 supports the idea that the ABHD2 and mPRβ mRNAs are expressed at similar levels at the message level, although of course mRNA levels and protein levels do not correlate well when different gene products are compared (Wuhr's 2014 Curr Biol paper reported correlation coefficients of about 0.3).

      We agree and have changed the text as follow to specifically point out to RNA: “we confirmed that ABHD2 RNA is expressed in the oocyte at levels similar to those of mPRβ RNA (Fig. 1B).”

      (6) Results, para 2. It would be worth pointing out that since an 18 h incubation with microinjected antisense oligos was sufficient to substantially knock down both the ABHD2 mRNAs (Fig 1C) and the ectopically-expressed proteins (Fig 1D), the mRNA and protein half-lives must be fairly short, on the order of a few hours or less.

      Done

      (7) Figure 1. Please make the western blots (especially Fig 1D) and their labeling larger. These are key results and as it stands the labeling is virtually unreadable on printed copies of the figures. I'm not sure about eLife's policy, but many journals want the text in figures to be no smaller than 5-7 points at 100% size.

      Likewise for many of the western blots in subsequent figures.

      As requested by the Reviewer we have increased the font and size of all Western blots in the Figures.

      (8) Figure 1E, G. I am not sure one should compare the effectiveness of the ABHD2 rescue (Fig 1E) and the mPRβ rescue (Fig 1G). Even if these were oocytes from the same frog, we do not know how the levels of the overexpressed ABHD2 and mPRβ proteins compare. E.g. maybe ABHD2 was highly overexpressed and mPRβ was overexpressed by a tiny amount.

      Although this is a possibility, the expression levels of the proteins here is not of much concern because we previously showed that mPRβ expression effectively rescues mPRβ antisense knockdown which inhibits maturation (please see (Nader et al., 2020)). This argues that at the levels of mRNA injected mPR is functional to support maturation, yet it does not rescue ABHD2 knockdown to the same levels (Fig. 1G). With that it is fair to argue that mPRβ is not as effective at rescuing ABHD2 KD maturation.

      (9) Inhibitor studies: There are two likely problems in comparing the observed potencies with legacy data - in vitro vs in vivo data and frog vs. mammalian data. Please make it clear what is being compared to what when you are comparing legacy data.

      The legacy data are from the literature based on the early studies that defined the IC50 for inhibition primarily using in vivo models (cell line mostly) but not oocytes. Typically, frog oocytes require significantly higher concentrations of inhibitors to mediate their effect because of the high lipophilic yolk content which acts as a sponge for some drugs. So, the fact that the drugs that are effective in inhibiting oocyte maturation (ACA, MP-A08, and Ibuprofen) work in a similar or lower concentration range to the published IC<sub50</sub> gives us confidence as to the specificity of their effect. We have revised Table 1 to include the reference for each IC<sub50</sub> value from the literature to allow the reader to judge the exact model and context used.

      (10) Isn't it surprising that Gas seems to promote maturation, given the Maller data (and data from others) that cAMP and PKA oppose maturation (see also the authors' own Fig 1A) and the authors' previous data sees no positive effect (minor point 7 above)?

      We show that a specific Gas inhibitor NF-449 inhibits maturation (although at relatively high concentrations), which is consistent with a positive role for Gas in oocyte maturation. We argue based on the lipidomics data and the inhibitors data that GPCRs play a modulatory role and not a central early signaling role in terms of releasing oocyte meiotic arrest. They are likely to have effects on the full maturation of the egg in preparation for embryonic development. The actions of the multiple lipid messengers generated downstream of mPRβ activation are likely to act through GPCRs and could signal through Gas or other Ga or even through Gβγ. Minor point 7 refers to the size of Western blots.

      (11) Page 9, bottom: "...one would predict activation of sphingosine kinases...." Couldn't it just be the activity of some constitutively active sphingosine kinase? Maybe replace "activation" with "activity".

      A constitutively sphingosine kinase activity would not make sense as it needs to be activated by P4.

      (12) Sometimes the authors refer to concentrations in molar units plus a power of 10 (e.g. 10-5 M) and sometime in µM or nM, sometimes even within the same paragraph. This makes it unnecessarily difficult to compare. Please keep consistent.

      We replaced all the concentrations through the text to M with scientific notation for consistency as requested by the Reviewer.

      (13) Fig 3I: "Sphingosine kinase" is misspelled.

      This has been corrected. We thank the Reviewer for catching it.

      (14) Legend to Fig. 5: Please change "after P4 treatment in reticulocytes" to "after P4 treatment in reticulocyte lysates".

      Done

      (15) Fig 6J. Doesn't the MAPK cascade inhibit MYT1? I.e. shouldn't the arrow be -| rather than ->?

      Yes the Reviewer is correct. This has been changed. We thank the Reviewer for noticing this error.

      (16) Materials and Methods, second paragraph. Please change "inhibitor's studies" to "inhibitor studies".

      Corrected thanks.

      (17) Table 1: Please be consistent in how you write Cox-2.

      Done.

      Reviewer #3:

      The findings are of potential broad interest, but I have some concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. Importantly, several claims regarding lipid metabolism signaling in the context of oocyte maturation are made without critical validation that the intended target is inactivated with reasonable selectivity across the proteome. Several of the inhibitors used for pharmacology and metabolomics are known covalent inhibitors (JZL184 and MJN110) that can readily bind additional lipases depending on the treatment time and concentration.

      I did not find any data using the reported ABHD2 inhibitor (compound 183; PMID: 31525885). Is there a reason not to include this compound to complement the knockdown studies? I believe this is an important control given that not all lipid effects were reversed with ABHD2 knockdown. The proper target engagement and selectivity studies should be performed with this ABHD2 inhibitor.

      We obtained aliquots the reported ABHD2 inhibitor compound 183 from Dr. Van Der Stelt and tested its effect on oocyte maturation at 10<sup>-4</sup>M using both low (10<sup>-7</sup>M) or high (10<sup>-5</sup>M) P4 concentration. Compound 183 partially inhibited P4-mediated oocyte maturation. The new data was added to the manuscript as Supplemental Figure 3D.

      Additional comments:

      (1) Pristimerin was tested at low P4 concentration for effects on oocyte maturation. Authors should also test JZL184 and MJN110 under this experimental paradigm.

      We have tested the effect of high concentration (2.10-<sup>-5</sup>M) of JZL184 or MJN110 on oocyte maturation at low P4 concentration (Author response image 3).  MJN 110 did not have a prominent effect on oocyte maturation at low P4, whereas JZL184 inhibited maturation by 50%. However, this inhibition of maturation required concentrations of JZL 184 that are 10 times higher than those reported in rat and human cells (Cui et al., 2016; Smith et al., 2015), arguing against an important role for a monoacylglycerol enzymatic activity in inducing oocyte maturation.

      Author response image 3.

      The effect of MJN110 and JZL184 compounds on oocyte maturation at low P4 concentration. Oocytes were pre-treated for 2 hours with the vehicle or with the highest concentration of 2.10-<sup>-5</sup> M for both JZL184 or MJN110, followed by overnight treatment with P4 at 10-<sup>7</sup>M. Oocyte maturation was measured as % GVBD normalized to control oocytes (treated with vehicle) (mean + SEM; n = 2 independent female frogs for each compound).

      2) Figure 4A showed different ct values of ODC between Oocytes and spleen, please explain them in the text. There is not any description regarding spleen information in Figure 4A, please make it clear in the text.

      We thank the Reviewer for this recommendation. The text was revised accordingly.

      (3) For Figures 3A, E, and I, there are different concentration settings for comparing the activity, is it possible to get the curves based on the same set of concentrations? The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect. Please set more concentration points to improve the figures. And for the error bar, there are different display formats like Figure 4c and 4d, etc. Please uniform the format for all the figures. Additionally, for the ctrl. or veh., please add an error bar for all figures.

      Some of the drugs tested were toxic to oocytes at high concentrations so the dose response was adjusted accordingly. The graphs were plotted to encompass the entire tested dose response. We could have plotted the data on the same x-axis range but that would make the figures uneven and awkward.

      We are not clear what the Reviewer means by “The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect.”

      The error bars for all dose responses are consistent throughout all the Figures. They are different from those on bar graphs to improve clarity. If the Reviewer wishes to have the error bars on the bar graphs and dose response the same, we are happy to do so. 

      For the inhibitor studies the data were normalized on a per frog basis to control for variability in the maturation rate in response to P4, which varies from frog to frog. It is thus not possible to add error bars for the controls.

      (4) Please check the sentence "However, the concentration of HA130...... higher that......'; Change "IC50" to "IC50" in the text and tables. Table 1 lists IC50 values in the literature, but the references are not cited. Please include the references properly. For the IC50 value obtained in the research, please include the standard deviation in the table. For reference parts, Ref 1, 27, 32, 46, doublecheck the title format.

      We edited the sentence as follows to be more clear: “However, this inhibition of maturation required high concentrations of HA130  -at least 3 orders of magnitude higher that the reported HA130 IC<sub>50</sub>-…”

      We changed IC50 to subscript in Table 1.

      We added the relevant references in Table 1 to provide context for the cited IC50 values for the different inhibitors used.

      We added SEM to the IC<sub>50</sub> for inhibition of oocyte maturation values in Table 1.

      We checked the titles on the mentioned references and cannot identify any problems.

      References

      Cui, Y., Prokin, I., Xu, H., Delord, B., Genet, S., Venance, L., and Berry, H. (2016). Endocannabinoid dynamics gate spike-timing dependent depression and potentiation. eLife 5, e13185.

      Nader, N., Dib, M., Hodeify, R., Courjaret, R., Elmi, A., Hammad, A.S., Dey, R., Huang, X.Y., and Machaca, K. (2020). Membrane progesterone receptor induces meiosis in Xenopus oocytes through endocytosis into signaling endosomes and interaction with APPL1 and Akt2. PLoS Biol 18, e3000901.

      Smith, M., Wilson, R., O'Brien, S., Tufarelli, C., Anderson, S.I., and O'Sullivan, S.E. (2015). The Effects of the Endocannabinoids Anandamide and 2-Arachidonoylglycerol on Human Osteoblast Proliferation and Differentiation. PloS one 10, e0136546.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Editor and Reviewer Comments:

      Many thanks to the editor and reviewers for the thoughtful assessment of our manuscript “Commissureless acts as a substrate adapter in a conserved Nedd4 E3 ubiquitin ligase pathway to promote axon growth across the midline.” Thank you also for the positive comments about the quality of our writing, and for deeming our study rigorous and thorough. We are very pleased that, overall, you believe our combination of genetic and biochemical approaches offers useful insight into the mechanism of Robo regulation at the Drosophila embryonic midline and effectively reconciles the contradictory findings of previous studies done in this field.

      Response to the previous Public Reviews:

      We appreciate the concerns expressed by the reviewers and the suggestions of areas in which the study and manuscript could be improved. The reviewer suggestions were very helpful as we revised our manuscript in order to strengthen our mechanistic understanding of Robo downregulation and better characterize the role Nedd4 plays in this process. We strongly agree with Reviewer 1 that our insight into the mechanism of Robo downregulation via Comm would be much stronger had we not solely relied on overexpression experiments to investigate the effects of PY motif mutations on Comm function. While it is outside the scope of this particular paper, we appreciate your suggestion to use gene editing to investigate the role of PY motif mutation on endogenous comm function and believe this would be a useful question to address in future papers. In addition to this concern, both reviewers identified additional opportunities to strengthen the paper. We have done our best to incorporate reviewer suggestions and will outline how we addressed the following four areas that were identified by both reviewers as areas where additional data could strengthen our conclusions:

      (1) Additional experiments to examine Comm and Robo1 localization in vivo: Characterizing Robo localization in vivo when co-expressed with PY-mutant Comm variants.

      (2) Testing biochemical interactions in embryonic protein extracts: Examining the biochemical interaction between Robo, Comm, and Nedd4 in a more biologically relevant context than cell culture.

      (3) Additional genetic interaction experiments: A) Investigating whether Nedd4 overexpression enhances the Comm G.O.F phenotype of enhanced ectopic crossing. B) Testing for additional genetic interactions with comm.

      (4) Editing the text of the manuscript for clarity.

      (1) Characterizing Robo localization in vivo when co-expressed with Comm variants.

      In the first version of our manuscript, we characterized the localization of wild-type and PY mutant Comm variants expressed in apterous neurons (Figure 5C), but did not examine how these variants of Comm affected localization of their cargo Robo1. To address this gap, we co-expressed 10X UAS Comm-myc (WT, 1PY, 2PY) with 10X UAS Robo-HA under the ap gal4 driver, visualized Comm and Robo by immunostaining for Myc and HA, and measured colocalization between Comm and Robo. We found that Robo colocalizes equally with all comm variants and that its expression pattern mimics that of the Comm variant with which it is expressed. We observe that Robo is restricted to cell bodies when overexpressed with WT Comm but “leaks out” into axons when co-expressed with Comm 1PY or 2PY. This finding suggests that PY motifs are not only required for effective Comm localization to the appropriate cellular areas, but also for proper routing of its cargo, Robo1. These new data are presented in a new supplemental figure: Figure S3.

      (2) Examining the biochemical interaction between Robo, Comm, and Nedd4 in vivo.

      To examine biochemical interaction between Comm, Robo, and Nedd4 in a more biologically relevant context, we performed immunoprecipitations in fly embryonic lysate prepared from the following categories: WT, elav gal4: 5X UAS Comm-myc WT, and elav gal4: 5X UAS Comm-myc WT + 10X UAS Nedd4-HA. We performed immunoprecipitation for myc (Comm), and blotted for endogenous Robo, Myc (Comm), and HA (Nedd4). Corroborating our results in cell culture (Figure 7 A-C), we were able to pull down a three-protein complex consisting of Comm, Nedd4 and Robo in embryonic fly tissue. These new data are presented in a new supplemental figure: Figure S8.

      (3) Investigating additional genetic interactions between Comm and Nedd4.

      A) In our submitted manuscript, we demonstrated that overexpression of Nedd4 enhances Comm-induced downregulation of Robo levels (Figure 7 D-G). To determine whether Nedd4 also increases ectopic crossing, which is a morphological output of Comm activity/Robo downregulation, we analyzed nerve cord phenotypes in embryos from the following categories: WT, embryos expressing WT Comm under the elav gal4, and embryos co-expressing WT Comm and Nedd4 under the elav gal4 driver. We measured nerve cord widths and sorted them into three different “bins” of phenotypic severity, with more severe phenotypes being characterized by thinner nerve cords. We find that the distribution of phenotypes in embryos expressing Comm alone differs significantly from embryos expressing Comm + Nedd4, with the latter shifted toward more severe/thinner phenotypic classes. In addition to examining nerve cord width, we investigated whether Nedd4 can enhance collapse of the nerve cord segments (defined by loss of negative space within the segment) induced by Comm overexpression. We determined percentage of collapsed nerve cord segments and divided these values into three phenotypic classes: no collapse, partial collapse, and total collapse. The distribution of phenotypes in embryos co-expressing Nedd4 and Comm differs significantly from those expressing Comm alone. In the Comm expressing population, we only observe nerve cords with no or partial collapse, but in flies co-expressing Comm and Nedd4 we observe the more severe complete collapse phenotype. These findings suggest that addition of Nedd4 enhances the Comm gain of function phenotype both by further reducing nerve cord width and increasing the occurrence of defects related to ectopic crossing. These new data are presented in a new supplemental figure: Figure S9.

      B) The reviewers also suggested additional genetic interaction experiments between Nedd4 and Comm. It was suggested that we included experiments to look at Nedd4 manipulations in Comm null mutant backgrounds. However, given the complete penetrance and expressivity of the Comm null mutation in which no axons cross the midline, these experiments would not be informative. As an alternative, we attempted to use the described hypomorphic Comm allele, but here too, the baseline commissural axon guidance defects are too strong to allow meaningful detection of enhanced phenotypes. Finally, we tested whether removing one copy of comm could reveal phenotypes in the nedd4 zygotic mutants, but we did not detect defects. This is perhaps unsurprising given that comm heterozygotes have no detectable midline crossing defects.

      (4) Text edits.

      We have made a variety of changes to decrease ambiguity in the text and create a more user-friendly experience for the reader. In the text, as opposed to just the figures, we now explicitly state whether we use 5X or 10X UAS constructs for each of our overexpression constructs. We also edited all mentions of the truncated frazzled construct (FraDc) so that they are uniform. We have also edited all mentions of MiMIC so that they are uniform. In addition, we answer a few questions the reviewers posed. First, we clarify that S2R+ cells express endogenous Comm at very low levels. In addition, we clarify about how we know expression levels are similar across the three Comm variants by explaining that transgenes incorporated into the fly genome by targeted insertion into the same location on the third chromosome.

      We hope that these changes adequately address reviewer concerns, strengthen our study, and enhance readability of the paper. We appreciate the time you took to evaluate our manuscript and the thoughtful commentary and suggestions that you provided.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths:

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Weaknesses:

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      We will tone down the description and cite related references.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      Ww will textually address this issue.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      We will textually address this issue.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      We will cite this paper and discuss on this issue.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      We will provide more methodological details.

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      We will textually address this issue.

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      We will textually address this issue and provide additional videos.

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      We will textually address the first question and provide more detailed analysis for the second question.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      We will tone down this statement.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      We conducted pathway analysis in the FlyWire electron microscopy database to investigate the connection between Or47b neurons and pC1 neurons. The results indicate that at least three intermediate neurons are required to establish a connection from Or47b neurons to pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they provide a reference for circuit connect in males. Using the currently available upstream and downstream tracing tools (e.g., retro-/trans-Tango), it is not possible to establish a direct connection between the two. Identifying the intermediate neurons involved in this connection is beyond this study. We will discuss on this concern in our revised manuscript.

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

      We will discuss on this concern.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc.), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Thank you for the acknowledgment of the novelty and significance of the study, and your suggestions for improving the manuscript.

      Weaknesses:

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      We will either tone down this statement or provide additional analysis.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

      We will provide more detailed methods and data analysis regarding tussling behavior.

      Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Thank you for the precise summary of the key findings and novelty of the study, and your insightful suggestions.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      We will add more details in methods.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      Reply: we will perform additional morphological and functional experiments on pC1<sup>SS2</sup> neurons, e.g., whether they are fru or dsx positive and comparing them with P1a neurons.

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other. Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Regarding why Or47b-expressing OSNs regulate two types of innate behaviors, we will add a discussion in the revised manuscript to explore the possible mechanisms underlying this phenomenon.

      Regarding the relationship between Or47b-expressing OSNs and pC1<sup>SS2</sup> neurons, we conducted pathway connection analyses using the FlyWire database. Although the FlyWire database currently only contains neuronal data from female brains, these findings provide a certain degree of reference. The results indicate that at least three intermediate neurons are required to establish the connection between these two neuronal types. We hope the editor and reviewers would agree with us that identifying these intermediate neurons involved in this connection is beyond this study.

      Regarding the rationale and conclusions from the experiments in Figure 5, we acknowledge the difficulty in quantifying tussling and lunging behaviors in these experiments. In the revised manuscript, we will tone down the statements about the relationship between fighting strategies and reproductive success. Additionally, we will provide further behavioral experiments to support the association between these two factors.

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

      We will add more details in methods and cite additional references, we will also perform additional experiment on pC1<sup>SS2</sup> function.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

      We thank the reviewer for pointing this out. We acknowledge that our in vitro system may indeed not fully replicate the complex in vivo environment in light of the heterogenous responses of macrophages to Mtb infection in whole animal models. We do believe, however, that the Hoxb8 in vitro model provides a powerful genetic tool to interrogate host-Mtb interactions using primary macrophages that represent the bone marrow-derived macrophage lineage. Reviewer #1 also made several helpful suggestions in their recommendations to authors relating to the reorganization of the data in our Figures in both the manuscript and the supplemental data.  We will incorporate these suggestions into the revised version of the manuscript upon resubmission.

      Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis. A

      a) The phenotypes of PLIN2-/-, FATP1-/-, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point.

      We thank the reviewer for highlighting this. Our main focus was on assessing the impact of manipulating lipid homeostasis in macrophages and the consequences this has on the intracellular growth of Mtb.  It was never our intention to imply these mutants generated equivalent phenotypes, and we will modify the revised manuscript to reflect this point.  We will stress that interfering with lipid processing at different stages in macrophages results in both shared and divergent anti-microbial conditions against Mtb.

      b) Finding the FATP1 transcript in the HOXB8-derived FATP1-/- CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1<sup>-/-</sup> cells.

      CRISPR-Cas9 targeting of genes with single sgRNAs as is the case with our mutants generates insertions and deletions (INDELs) at the CRISPR cut site. These INDELs do not block mRNA transcription totally, and this is widely reported and accepted in the field.  In these cases, RT-PCR or RNA-seq methods are not used to verify CRISPR knockouts as they are not sensitive enough to identify INDELs. We provide knockout efficiencies by ICE analysis in supplemental information file 1 for all the mutants used in the study. We also demonstrate protein depletion by western blot and flow cytometry for all the mutants (Figure 1 - figure supplement 1). Only mutants with greater than >90% protein depletion were used for subsequent characterization.

      c) No gene showing differential regulation in FATP1<sup>-/-</sup> macrophages, which is very surprising.

      We assume the reviewer is referring to the Mtb transcriptome response in FATP1<sup>-/-</sup> macrophages, which we agree was unexpected.  However, we saw a significant compensatory response in the host cell (at transcriptional level) in FATP1-/- macrophages as evidenced by an upregulation of other fatty acid transporters (Figure 5 - figure supplement 1). We postulate that these compensatory responses could, in part, alleviate the stresses the bacteria experience within the cell, and these were discussed in the manuscript.

      d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      We thank the reviewer for the suggestion. However, confocal imaging is also widely used to measure ROS with similar quantitative power and individual cell resolution (PMID: 32636249, 35737799).

      (2) Experimental design: For a few assays, the experimental design is inappropriate

      a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.

      We would like to point out that monitoring LC3I to LC3II conversion by western blot, confocal imaging of LC3 puncta and qPCR analysis of autophagy related genes are all validated assays for monitoring autophagic flux in a wide variety of cells. We refer the reviewer to the latest extensive guidelines on the subject (PMID: 33634751). Furthermore, Bafilomycin A and chloroquine are not specific inhibitors of autophagy and therefore are of limited value as controls. BafA is an inhibitor of the proton-ATPase apparatus as well impacting autophagy through activity on the Ca-P60A/SERCA pathway. Chloroquine impacts vacuole acidification, autophagosome/lysosome fusion and slows phagosome maturation. So, while BafA and chloroquine will reduce autophagy their effects are pleotropic and their impact on Mtb is unknown.

      b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      See our response above.

      (3) Using correlative observations as evidence:

      a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.

      We are not entirely sure where we used our RNA-seq data sets as functional readouts. We used our transcriptome data to provide a preliminary identification of anti-microbial responses in the mutant macrophages infected with Mtb. Where applicable, we followed up and confirmed the more compelling RNA-seq data either by metabolic flux analyzes, qPCR, ROS measurements, and quantitative imaging.

      b) Claiming that the inability to generate lipid droplets in PLIN2-/- cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      Again, it was not our intention to infer causality. Throughout the manuscript, we endeavor to present our data with a specific focus on describing the consequences of interfering with either fatty acid import, lipid droplet biogenesis and fatty acid oxidation on macrophage responses to Mtb.  We will revisit the revised manuscript to remove any sections that imply causality.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      To the best of our knowledge, Mtb growth restrictions in PLIN2 and FATP1 deficient macrophages have not been reported elsewhere. To the contrary, PLIN2 knockout macrophages obtained from PLIN2 deficient mice have been reported to robustly support Mtb replication (PMID: 29370315), quite the opposite to our data. We extensively discuss these discrepancies in the manuscript. We also discuss and cite appropriate references where Mtb growth restriction for similar macrophage mutants have been reported (CD36<sup>-/-</sup> and CPT2<sup>-/-</sup>). Our aim was to carry out a systematic myeloid specific genetic interference of fatty acid import, storage and catabolism to assess the effect on Mtb growth at all stages of lipid handling instead of focusing on one target. In the chemical approach, we used TMZ and Metformin deliberately because they had already been reported as being active against intracellular Mtb and we wished to place our data in the context of existing literature.  These studies were referenced extensively in the text.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

      We will re-organize the figures in the manuscript revision as per the reviewer’s recommendation, and the recommendations of reviewer #1.

      We will address the other concerns raised by reviewer #2 in the recommendations to authors during revision of the manuscript. 

      Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, reveals specific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 311-319). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract line 27, introduction lines 58-60, results line 215, conclusion lines 294-295).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 301-322), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 311-319).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      We plan to incorporate statistical analysis of this point in the revised version.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We plan to incorporate statistical analysis of this point in the revised version.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 180-182).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      We plan to incorporate statistical analysis of this point in the revised version.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 301-322).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 301-322). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      We plan to incorporate statistical analysis of this point in the revised version.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 277-278). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We plan to incorporate statistical analysis of this point in the revised version.

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane atg8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage. 

      Strengths: 

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process. 

      Weaknesses: 

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved. 

      We agree with the reviewer.  We do however show how at least one aspect of atg8ylation contributes to the proper retromer function, which occurs via lysosomal membrane maintenance and repair. Understanding the more direct effects on retromer will require a separate study. We now emphasize this in the revised manuscript (p. 18) and point out the limitations of the present work (p. 18): “One of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s that still need to be understood”.

      Reviewer #2 (Public Review): 

      Summary:

      Padder et al. demonstrate that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. 

      Strengths: 

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking. 

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be differences in regulation dependent on the stress imposed on a cell. 

      We thank the reviewer for the overall assessment of the strengths of the study.  We have discussed in the manuscript the elegant study by Roy et al., PMID 28602683. To accommodate reviewer’s comment, we have additionally emphasized in the text that our study is focused on basal conditions and conditions that perturb endolysosomal compartments. We agree with the reviewer that under metabolic stress conditions (such as glucose limitation) more complex pathways may be engaged and have acknowledged that in the discussion. We have now included this in the limitations of the study (p. 18): “Another limitation of our study is that we have focused on basal conditions or conditions causing lysosomal damage, whereas metabolic stress including glucose excess or limitation with its multitude of metabolic effects have not been addressed”.

      Weaknesses: 

      (1) Additional controls are needed to clarify the role of CASM in the control of retromer function. Because the manuscript proposes both CASM-dependent and independent pathways in the ATG5 mediated regulation of the retromer, it is important to provide robust evidence that CASM is required for retromer-dependent GLUT1 sorting to the plasma membrane vs. lysosome. The experiments with monensin in Fig. 7C-E are consistent with but not unequivocally corroborative of a role for CASM. 

      We fully agree with the reviewer. In fact, our data with bafilomycin A1 treatment causing GLUT1 miss-sorting show that it is the perturbance of lysosomes  and not CASM per se that leads to mis-sorting of GLUT1 (Fig. 7D,E). Note that it has been shown (PMIDs: 28296541, 25484071 and 37796195) that although bafilomycin A1 deacidifies lysosomes it does not induce but instead inhibits CASM. This is because bafilomycin A1 causes dissociation of V1 and V0 sectors of V-ATPase, unlike other CASM-inducing agents which promote V1 V0 association. Complementing this, our data with ATG2AB DKO and ESCRT VPS37A KO (Fig. 8A-F) indicate that the repair of lysosomes is important to keep the retromer machinery functional (as illustrated in Fig. 8G). This may be one of the effector mechanisms downstream of membrane atg8ylation in general and hence also downstream of CASM. We have revised Fig. 7 title to read “Lysosomal perturbations cause GLUT1 mis-sorting” and have explained these relationships in the text (p. 12-13): “Since bafilomycin A1 does not induce CASM but disturbs luminal pH, we conclude that it is the less acidic luminal pH of the endolysosomal organelles, and not CASM, that is sufficient to interfere with the proper sorting of GLUT1.”

      Based on the results shown with ATG16KO in Fig 4A-D, rescue experiments of these 16KO cells with WT vs. C-terminal WD40 mutant versions of ATG16 will specifically assess the requirement for CASM and potentially provide more rigorous support for the conclusions drawn. 

      We have carried out complementation with ATG16L1 WT and its E230 mutant (devoid of WD40 repeats but still capable of canonical autophagy) and placed these data in Fig. 7 (panels I and J) as recommended by the reviewer. This is now described on p. 13 (To additionally test this notion, we compared ATG16L1 full length (ATG16L1FL) and ATG16L1E230 (Rai et al., PMID 30403914) for complementation of the GLUT1 sorting defect in ATG16L1 KO cells (Fig. 7I,J). ATG16L1E230 [Rai, 2019, 30403914] lacks the key domain to carry out CASM via binding to VATPase 29,30 31-33 but retains capacity to carry out atg8ylation.  Both ATG16L1FL and ATG16L1E230 complemented mis-sorting of GLUT1 (Fig. 7I,J). Collectively, these data indicate that it is not absence of CASM/VAIL but absence of membrane atg8ylation in general that promotes GLUT1 mis-sorting.).

      (2) Also, the role of TBC1D5 should be further clarified. In Fig S7, are there any changes in the interactions between TBC1D5 and VPS35 in response to LLOMe or other agents utilized to induce CASM? 

      We thank the reviewer for pointing this out. We do have data with VPS35 in co-IPs shown in Fig. S7.  There is no change in the amounts of VPS35 or TBC1D5 in GFP-LC3A co-IPs. We now include in Fig. S7 (new panel D) a graph with quantification in the revised manuscript and emphasize this point (p. 12): “However, under CASM-inducing conditions, no changes were detected (Fig. S7B-D) in interactions between TBC1D5 and LC3A or in levels of VPS35 in LC3A co-IP, a proxy for LC3A-TBC1D5-VPS29/retromer association. This suggests that CASM-inducing treatments and additionally bafilomycin A1 do not affect the status of the TBC1D5-Rab7 system”.        

      Does TBC1D5 loss-of-function modulate the numbers of GLUT1 and Gal3 puncta observed in ATG5 deficient cells in response to LLOMe? 

      We agree that TBC1D5 is an interesting aspect. However, because TBC1D5 does not change its interactions in the experiments in our study, we consider this topic (i.e. whether TBC1D5 phenocopies VPS35 and ATG5 KOs in its effects on Gal3) to be beyond the scope of the present work. We underscore that LLOMe (lysosomal damage) mis-sorts GLUT1 even without any genetic intervention (e.g., in WT cells in the absence of ATG5 KO; Fig. 7). Thus, in our opinion the effects of TBC1D5 inactivation may be a moot point.  

      (3) Finally, the studies here are motivated by experiments in Fig. S1 (as well as other studies from the Deretic and Stallings labs) suggesting unique autophagy-independent functions for ATG5 in myeloid cells and neutrophils in susceptibility to Mycobacterium tuberculosis infection. However, it is curious that no attempt is made to relate the mechanistic data regarding the retromer or GLUT1 receptor mis-sorting back to the infectious models. Do myeloid cells or neutrophils lacking ATG5 have deficiencies in glucose uptake or GLUT1 cell surface levels? 

      Reviewer’s point is well taken. Glucose uptake, its metabolism, and diabetes underly resurgence in TB in certain populations and are important factors in a range of other diseases. This was alluded to in our discussion (lines 461-469). However, these are complex topics for future studies. We have now expanded this section of the discussion (p. 18): “In the context of tuberculosis, diabetes, which includes glucose dysregulation, is associated with increased incidence of active disease and adverse outcomes” (Dheda et al., ,PMID: 26377143; Dooley, et al., PMID:19926034).

      Reviewer #3 (Public Review): 

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1.

      Nevertheless, there are several issues that the authors need to address to further clarify their conclusions. 

      (1) The authors confirmed the interaction between Atg5 and the Retromer complex through Co-IP experiments. Is the interaction between Atg5 and the Retromer direct? If it is direct, which Retromer complex protein regulates the interaction with Atg5? Additionally, does ATG5 K130R mutant enhance its interaction with the Retromer? 

      AlphaFold modeling in the initial submission of our study to eLife (absent from the current version) suggested the possibility of a direct interaction between ATG5 and VPS35 with ATG12—ATG5 complex facing outwards, in which case K130R would not matter. However, mutational experiments in putative contact residues did not alter association in co-IPs. So either ATG5 interacts with other retromer subunits or more likely is in a larger protein complex containing retromer. It will take a separate study to dissect associations and find direct interaction partners. 

      (2) To more directly elucidate how ATG5 regulates Retromer function by interacting with the Retromer and participates in the trafficking of GLUT1 to the plasma membrane, the authors should identify which region or crucial amino acid residues of ATG5 regulate its interaction with the Retromer. Additionally, they should test whether mutations in ATG5 that disrupt its interaction with the Retromer affect Retromer function (such as participating in the trafficking of GLUT1 to the plasma membrane) and whether they affect Atg8ylation. They also need to assess whether these mutations influence canonical autophagy and lysosomal sensitivity to damage. 

      Please see the response to point 1.

      Recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors): 

      While most data are solid and convincing, the following questions need to be addressed before publication: 

      Major Concerns: 

      (1) Examining only one cargo (GLUT1) is insufficient to reflect the retromer's function comprehensively. At least two additional cargoes should be analyzed to observe the phenotypes more accurately. 

      We agree that having another retromer cargo (in addition to GLUT1) would be of interest. We point out that our data also show mis-sorting of SNX27 to lysosomes (Fig. 3H, quantifications in Fig. 3I).  SNX27 in turn sorts nearly 80 ion channels, signaling receptors, and other nutrient transporters. Which of the 80 cargos to prioritize and check (the expectation is that all 80 might be missorted given that they need SNX27)?  We have instead tested MPR, a SNX27-independent cargo. We now include data on effects of ATG5 knockout on CI-MPR (Fig. S9A-F). This is described in the text (p. 14; “Effect of ATG5 knockout on MPR sorting

      We tested whether ATG5 affects cation-independent mannose 6-phosphate receptor (CI-MPR). For this, we employed the previously developed methods (Fig. S9A) of monitoring retrograde trafficking of CI-MPR from the plasma membrane to the TGN 70,118-121. In the majority of such studies, CI-MPR antibody is allowed to bind to the extracellular domain of CI-MPR at the plasma membrane and its localization dynamics following endocytosis serves as a proxy for trafficking of CI-MPR. We used ATG5 KOs in HeLa and Huh7 cells and quantified by HCM retrograde trafficking to TGN of antibody-labeled CI-MPR at the cell surface, after being taken up by endocytosis and allowed to undergo intracellular sorting, followed by fixation and staining with TGN46 antibody. There was a minor but statistically significant reduction in CIMPR overlap with TGN46 in HeLaATG5-KO that was comparable to the reduction in HeLa cells when

      VPS35 was depleted by CRISPR (HeLaVPS35-KO) (Fig. S9B,C). Morphologically, endocytosed Ab-CI-

      MPR appeared dispersed in both HeLaATG5-KO and HeLaVPS35-KO cells relative to HeLaWT cells (Fig. S9D). Similar HCM results were obtained with Huh7 cells (WT vs. ATG5KO; Fig. S9E,F). We interpret these data as evidence of indirect action of ATG5 KO on CI-MPR sorting via membrane homeostasis, although we cannot exclude a direct sorting role via retromer. We favor the former interpretation based on the strength of the effect and the controversial nature of retromer engagement in sorting of CI-MPR (57,70,75,98,120).”)

      (2) The evidence from Alphafold predictions is weak. The direct interaction of ATG5 with retromer subunits should be tested. 

      Please see the above response to Reviewer 3.

      In addition, does retromer also interact with ATG16L1 similarly to the phenomenon in VAIL? 

      We fully agree with the reviewer that finding the direct interacting partners between retromer and membrane atg8ylation machinery is an important direction as in our opinion it would expand the repertoire of E3 ligases and its adaptors. However, given the complexity and variety of possibilities, we believe that this is a topic for a future study.  

      (3) In Line 166, Figures 2C and 2D, the Gal3 phenotype does not seem to be well complemented by VPS35. 

      We have adjusted the text to acknowledge incomplete complementation (p.7). 

      (4) In Figures 3 and 4, the authors show that KO of membrane atg8ylation machineries and ATG8-Hexa KO affects the localization of retromer cargo GLUT1 and SNX27. However, the mechanism by which membrane ATG8ylation affects retromer remains unresolved.

      Additionally, are other retromer subunits' locations are also affected, if so, how are they impacted? At least a speculative explanation should be provided. 

      Following reviewers request, we now state on p. 19 that “one of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s on retromer that still need to be understood”.

      (5) In Figure 3, endogenous IP results are required to examine the interaction of ATG5 with retromer if suitable retromer antibodies for IP are available. 

      Endogenous IPs are given in Fig. 1. We have modified text on p. 8 to clarify this.

      (6) In Figure 4, ATG8 Hexa KO, and triple KO of LC3s or GABARAPs all increase the localization of GLUT1 on lysosomes. It seems redundant for ATG8 family proteins here.

      Can any individual member of the ATG8 family rescue this phenotype? 

      If the intent of such complementation analysis is to identify a specific mATG8 responsible for the observed effects, this is already pre-empted by the fact that TKOs also have a similar effect as HEXA mutants (i.e. loss of at least two of mATG8s is enough to cause the phenotype). We now discuss this in the text (p. 10): “Thus, at least two mATG8s, each one from two different mATG8 subclasses (LC3s and GABARAPs) or the entire membrane atg8ylation machinery was engaged in and required for proper GLUT-1 sorting”.  

      (7) In Figure 5, knockdown of ATG5 in FIP200 KO cells inhibited GLUT1 sorting from endosomes, leading to its trafficking to lysosomes. However, it is known that very little remnant ATG5 in ATG5 KD cells is enough to support ATG8 lipidation. Therefore, it is essential to repeat this experiment using ATG5/FIP200 double KO or ATG5 KO combined with an autophagy inhibitor. 

      We point out to this limitation in the text (p. 11): “….we knocked down ATG5 in FIP200 KO cells (Fig. S5D) and found that GLUT1 puncta and GLUT1+LAMP2+ profiles increased even in the FIP200 KO background with the effects nearing those of VPS35 knockout (Figs. 5D-F and S5C), with the difference between VPS35 KO and ATG5 KD attributable to any residual ATG5 levels in cells subjected to siRNA knockdowns”.

      (8) In Figure 7, the authors show that the induction of CASM inhibited GLUT1 sorting from endosomes. However, ATG5 KO, which abolishes membrane ATG8ylation, also inhibits GLUT1 sorting. This seems paradoxical and requires a reasonable explanation or discussion. 

      We understand reviewer’s comment. The answer to this paradox is that it is actually the lysosomal damage that causes GLUT1 mis-sorting and not CASM. Membrane atg8ylation, such as CASM and probably other processes given that involvement of both ATG2 and ESCRTs (Fig. 8) counteracts the damage and works in the direction of restoring/maintaining proper retromer-dependent sorting. This is now explained better in the text, and have revised the title of Fig. 7 to read “Lysosomal damage causes GLUT1 mis-sorting”. Our data with bafilomycin A1 show that it is the perturbance of lysosomes (not CASM per se) that leads to mis-sorting of GLUT1 (Fig. 7D,E), and our data with ATG2AB DKO and ESCRT (VPS37A) KO (Fig. 8A-F) indicate that repair of lysosomes is important to keep the retromer working machinery functional (as illustrated in Fig. 8G), which may be one of the effector mechanisms downstream of membrane atg8ylation  in general (and hence also of CASM).  

      (9) The immuno-staining results for Figures 7F and 7G are lacking. 

      We now provide the requested images.

      (10) In Figure 8D, the quality of the image for VPS37 KO cells treated with LLOME is not sufficient to show increased colocalization between GLUT1 and LAMP2. 

      We now provide a different example image. We note that these are epiflorescent HCM images  

      Minor Concerns: 

      (1) It would be better to distinguish the function of the membrane ATG8ylation machinery (i.e., ATG5) from the function of membrane ATG8ylation in the description. No ATG8ylation-deficient mutants were used in this study. 

      We have used atg8ylation mutants (e.g. KOs in ATG3, ATG5, ATG7, and ATG16L1). We now emphasize this better in the text (p. 10). 

      (2) In Figure 2D, a green box appears there by incident. 

      This has been fixed.

      (3) In Figure 3A, the conjugate for ATG5-ATG12 is absent in the gel for IB: ATG5.

      The ATG5 antibody used in Fig. 3A recognizes primarily the conjugated form of ATG5. This is now clarified in the figure legend. 

      (4) Figure 5G is missing in the manuscript. 

      Fig 5G is now mentioned in the text. Thank you.

      (5) The gRNA sequence information for FIP200 KO is missing in the Methods section. 

      Reference(s) to the already published gRNA sequence are in the manuscript. 

      (6) Suggest moving the last paragraph in Result section to Discussion section. 

      We kept this single-paragraph section in Results as it contains actual data.

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is unclear why the rescue of VPS35KO cells in Fig 1C-D is so modest. 

      Complementation data depend on transfection efficiency and some variability is to be expected.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figures 2A, 2C, 2E, and 2G lack scale bars. Figure 2D has a small square above the y axis. 

      Relative scale bars are now included. 

      (2) Figures S3B, S3D, and S3F lack scale bars. 

      Relative scale bars are now included.

    1. Author response:

      We thank the Editor and Reviewers for their work on our manuscript, and are happy to receive their positive comments, as well as their questions and suggestions. We are currently revising the manuscript and are planning to de-emphasize Brownian recovery as a simple yet biologically irrelevant benchmark and include comparisons with other biologically inspired strategies suggested by the reviewers. As for sharing the code and data: we completely agree: dataset 1 is already public and we will share the other dataset as well as the code. In a nutshell, we will be addressing the referee’s suggestions as follows:

      (1)   As Referee 1 points out, even if the algorithm does not require a map of space, the agent is still required to tell apart North, East, South and West relative to the wind direction which is implicitly assumed known. We will better clarify the spatial encoding required to implement these strategies.

      (2)   Referee 1 remarks that the learned recovery strategy works best and suggests to give it a more prominent role and better characterize it. We agree that what is done in the void state is definitely key and more work is needed to understand it. In the revised manuscript, we are planning to further substantiate the statistics of the learned recovery by repeating training several times and comparing several trajectories. Note that this strategy is much more flexible than the others and could potentially mix aspects of recovery to aspects of exploitation: we defer a more in-depth analysis that disentangles these two aspects elsewhere.

      (3)   Referee 1 asks whether an optimal, minimal representation of the olfactory states exists. Q learning defines the olfactory states prior to training and does not allow to systematically optimize odor representation for the task. Given the odor features, we can however discretize them in more or less olfactory states. We expect that decreasing the number of olfactory states provides less positional information and potentially degrades performance, although loss in performance may be overshadowed by noise or by efficient recovery. We are planning to re-train our model with a smaller numer of non-void states and will provide the comparison. The number of void states does not need further testing: we chose 50 void states because it matches the time agents typically remain in the void and indeed achieves very high performance (less than 50 void states results in no convergence and more than 50 introduces states that are rarely visited)

      (4)   Both reviewers correctly remark that Brownian motion is not biologically relevant. We will make sure to further clarify that this is a rather simple --but biologically irrelevant-- benchmark. We are planning to include results with both circling and zigzaging as biologically inspired recovery strategies.

      (5)   We agree with reviewer 2 that animal locomotion does not look like a series of discrete displacements on a checkerboard. However, to overcome this limitation, one has to first focus on a specific system to define actions in a way that best adheres to a species’ motor controls. Second, these actions are likely continuous, which makes reinforcement learning notoriously more complex. While we agree that more realistic models are definitely needed for a comparison with real systems, this remains outside the scope of the current work.

      (6)   We agree with the referees and editor that it is important to publish the code and data alongside with the manuscript. It was already planned and we will make sure to share the links within the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Nelson et al. is focused on formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM) and cardioblasts (CBs) - in coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      This Public Review by Reviewer #1 is identical to their original Public Review, thus we are unsure whether Reviewer #1 assessed the revised version of our manuscript, and whether they read our responses to their original Public Review. Below we summarize our original responses to the weaknesses listed for the first version of our manuscript.

      Strengths:

      • Using expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      • Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      • Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs, because PSCs show a phenotype even when CBs do not (Fig 4G).

      • New insight into dorsal vessel formation by VM is presented in Fig 4A,B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      • The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Fig. 4G).

      We maintain our conclusion, and, we point out that the Reviewer stated, “The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs”. We already added a statement to the Discussion reminding the reader of the possibility of secondary defects (“Finally, it is possible that PSC cells do not intrinsically require Robo activation, but rather CB-independent PSC mis-positioning in sli or robo mutants could be a secondary defect caused by compromised Slit-Robo signaling in some other tissue.”).

      • If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Fig 4D.

      As described in our first response, use of Antp-GAL4 with RNAi would be no better than a whole animal double Robo mutant.

      • Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      Vm does not directly contact the PSC, so the Vm cannot be physically pushing the PSC. In their original review, Reviewer #3 expressed similar concerns (Weaknesses #1 and #2), and upon their review of our revised manuscript they determined we addressed these concerns.

      Reviewer #2 (Public review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams) and cardioblasts (CBs) are in proximity of PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Previous major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like specification of their cell types, structure and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We state in our original response that exploring the functional consequences of PSC mis-positioning was outside the scope of this study. Given that the necessary cis-regulatory modules have not been identified at Antp or Odd, creating a split-GAL4 with ‘Antp and Odd promoters’ cannot be accomplished in a reasonable time frame, as we previously detailed in our original response.

      (2) The densely, parallel aligned fibers in the lower part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      This was directly addressed by the additional data included in our revision. When epidermal closure is stalled, the PSC is able to migrate past the stalled leading edge, closer to the midline.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We addressed this comment in detail in our original response. Briefly, double-blinding was oftentimes not possible due to the obviousness of the genotype in the image. The criteria we outline for normal PSC positioning is as comprehensive as possible given the subtlety variability of mis-positioning phenotypes. Two of the authors independently analyzed the relatively large sets of samples and arrived at the same conclusions.

      (4) Discussion is very lengthy and should shortened.

      We shortened the Discussion in the revised version.

      Comments on revised version:

      Although the authors have responded to my concerns as they deemed suitable, these concerns still stand for the revised version.

      Given our responses above and the lack of detail in this comment, we are unsure why the Reviewer is still concerned.

      Reviewer #3 (Public review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessel. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented and well documented, and fully support the conclusions drawn from the different experiments.

      The authors have addressed all of my previous comments, in particular concerning the role of epidermal cell rearrangements during dorsal closure as a possible force acting on the movement of PSC cells. The authors have clarified their definition of "collective migration" as it applies to the movement of PSC. The revised paper will make an important contribution to our understanding of the mechanisms driving morphogenesis.

      We are appreciative of the time spent by the Reviewer reading our responses and assessing the revision.

      ---------

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Nelson et al. is focused on the formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large-scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      Since each referee asked for clarification concerning collective cell migration, we present a combined response further below, placed after the comments from Reviewer #3.

      Strengths:

      (1) Using the expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      (2) Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      (3) Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G).

      (4) New insight into dorsal vessel formation by VM is presented in Figure 4A, B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      (1) The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since the loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Figure 4G).

      We have reexamined our wording in the relevant Results section and, given that this referee agrees that we, “make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G)”, it was not clear how we might temper our conclusions more. Given that PSC cells express Robo1 and Robo2, and that the Vm does not contact the PSC, our ‘reasonable argument’ appears fair and parsimonious. Since we agree with the referee that a reader should be made as aware as possible of alternatives, we will add a comment to the Discussion, reminding the reader of the possibility of a secondary defect.

      (2) If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Figure 4D.

      While we agree that PSC-specific knockdown of Robo1 and Robo2 simultaneously would be ideal, this is not possible. First, the most-effective UAS-RNAi transgenes (that is, those in a Valium 20 backbone) are both integrated at the same chromosomal position; these cannot be simultaneously crossed with a GAL4 transgenic line to attempt double knock down. Additionally, as with all RNAi approaches that must rely on efficient knockdown over the rapid embryonic period, even having facile access to the above does not ensure the RNAi approach will cause as effective depletion as the genetic null condition that we use. Second, as the referee concedes, there is no embryonic PSC-specific GAL4. The proposed use of Antp-GAL4 would cause knockdown in many tissues (PSC, CB, Vm, epidermis and amnioserosa). This would lead to a reservation similar to that caused by our use of the straight genetic double mutant, as regards potential indirect requirement for Robo function.

      (3) Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      First, the Vm does not directly contact the PSC, so it cannot be pushing the PSC dorsally. We will re-examine our text to be certain to make this clear. Second, in our analysis of bin mutants, which lack Vm, LGs and PSCs are able to reach the dorsal midline region in the absence of Vm. Finally, please see our response to Reviewer #3, point 2, for why we maintain that PSC cells are “migrating” even though some PSC cells are attached to CBs.

      Reviewer #2 (Public Review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence, and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams), and cardioblasts (CBs) are in proximity to PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like the specification of their cell types, structure, and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We agree that the functional consequence of PSC mis-positioning is important and a relevant question to eventually address. However, virtually all markers and reagents used to assess the effect of the PSC on progenitor cells and their differentiated descendants are restricted to analyses carried out on the third larval instar - some three days after the experiments reported here. Most of the manipulated conditions in our work are no longer viable at this phase and, thus, addressing the functional consequences of a malformed PSC will require the field to develop new tools. 

      As we noted in the Introduction, the consistency with which the wildtype PSC forms as a coalesced collective at the posterior of the LG strongly suggests importance of its specific positioning and shape, as has now been found for other niches (citations in manuscript). Additionally, in the Discussion we mention the existence of a gap junction-dependent calcium signaling network in the PSC that is important for progenitor maintenance. Without continuity of this network amongst all PSC cells (under conditions of PSC mis-positioning), we strongly anticipate that the balance of progenitors to differentiated hemocytes will be mis-managed, either constitutively, and / or under immune challenge conditions. 

      Finally, to our knowledge, the tools do not exist to build a “split Gal4 system using Antp and Odd promoters”. The expression pattern observed using the genomic Antp-GAL4 line must be driven by endogenous enhancers–none of which have been defined by the field, and thus cannot be used in constructing second order drivers. Similarly, for odd skipped, in the embryo the extant Odd-GAL4 driver expresses only in the epidermis, with no expression in the embryonic LG. Thus, the cis regulatory element controlling Odd expression in the embryonic LG is unknown. In the future, the discovery of an embryonic PSC-specific driver will aid in addressing the specific functional consequences of PSC mis-positioning.

      (2) The densely, parallel aligned fibers in the part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      See response to Reviewer #3.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We appreciate the Reviewer’s concern and acknowledge that the phenotypes we observed were indeed variable, and, at times subtle. As we show and discuss in the paper, our results revealed that the signaling requirements for proper PSC positioning are complex; this was favorably commented upon by Reviewer #1 (“...highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development.…”). We suspect the phenotypic variability is attributable to any number of biological differences such as heterogeneity of PSC cells and an accompanying difference in the timing of their competence to receive and respond to Slit-Robo signaling, the timing of release of Slit from CBs and Vm, number of cells in a given PSC, which PSC cells in the cluster respond to too little or too much signaling, and/or typical variability between organisms. Furthermore, PSC positioning analyses were conducted by two of the authors, who independently came to the same conclusions. For many of the manipulations double blinding was not possible since the genotype of the embryo was discernible due to the obvious phenotype of the manipulated tissue.

      (4) The Discussion is very lengthy and should shortened.

      We will re-examine the prose and emphasize more conciseness, while maintaining clarity for the reader.

      Reviewer #3 (Public Review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessels. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented, well documented, and fully support the conclusions drawn from the different experiments. The manuscript differs in character from the mainstay of "big data" papers (for example, no sets of single-cell RNAseq data of, for instance, PSC cells with more or less Slit input, are offered), but what it lacks in this regard, it makes up in carefully planned and executed visualizations and genetic manipulations.

      Weaknesses:

      A few suggestions concerning improvement of the way the story is told and contextualized.

      (1) The minute cluster of PSC progenitors (5 or so cells per side) is embedded (as known before and shown nicely in this study) in other "migrating" cell pools, like the cardioblasts, pericardial cells, lymph gland progenitors, alary muscle progenitors. These all appear to move more or less synchronously. What should also be mentioned is another tissue, the dorsal epidermis, which also "moves" (better: stretches?) towards the dorsal midline during dorsal closure. Would it be reasonable to speculate (based on previously published data) that without the force of dorsal closure, operating in the epidermis, at least the lateral>medial component of the "migration" of the PSC (and neighboring tissues) would be missing? If dorsal closure is blocked, do essential components of PSC and lymph gland morphogenesis (except for the coming-together of the left and right halves) still occur? Are there any published data on this?

      Each of the Reviewers is interested in our response to this very relevant question, and, thus, we will address the issue en bloc here. First, we will add a Supplementary Figure showing that LG and CBs are still able to progress medially towards the dorsal midline when dorsal closure stalls.  This rules out any major effect for the most prominent “large-scale embryo cell sheet movement” in positioning the PSC. Second, published work by Haack et. al. and Balaghi et. al. shows that CBs and leading edge epidermal cells are independently migratory, and we will add this context to the manuscript for the reader.

      (2) Along similar lines: the process of PSC formation is characterized as "migration". To be fair: the authors bring up the possibility that some of the phenotypes they observe could be "passive"/secondary: "Thus, it became important to test whether all PSC phenotypes might be 'passive', explained by PSC attachment to a malforming dorsal vessel. Alternatively, the PSC defects could reflect a requirement for Robo activation directly in PSC cells." And the issue is resolved satisfactorily. But more generally, "cell migration" implies active displacement (by cytoskeletal forces) of cells relative to a substrate or to their neighbors (like for example migration of hemocytes). This to me doesn't seem really clearly to happen here for the dorsal mesodermal structures. Couldn't one rather characterize the assembly of PSC, lymph gland, pericardial cells, and dorsal vessel in terms of differential adhesion, on top of a more general adhesion of cells to each other and the epidermis, and then dorsal closure as a driving force for cell displacement? The authors should bring in the published literature to provide a background that does (or does not) justify the term "migration".

      Before addressing this specifically, we remind readers of our response above that states the rationale ruling out large, embryo-scale movements, such as epidermal dorsal closure, in driving PSC positioning. So, how are PSC cells arriving at their reproducible position? This manuscript reports the first live-imaging of the PSC as it comes to be positioned in the embryo. We interpret these movies to suggest strongly that these cells are a ‘collective’ that migrates. Neither the data, nor we, are asserting that each PSC cell is ‘individually’ migrating to its final position. Rather, our data suggest that the PSC migrates as a collective. The most paradigmatic example of directed, collective cell migration, is of Drosophila ovarian border cells. That cell cluster is surrounded at all times by other cells (nurse cells, in that case), and for the collective to traverse through the tissue, the process requires constant remodeling of associations amongst the migrating cells in the collective (the border cells), as well as between cells in the collective and those outside of it (the nurse cells). In fact, the nurse cells are considered the substrate upon which border cells migrate. Note also that in collective border cell migration cells within the collective can switch neighbors, suggesting dynamic changes to cell associations and adhesions. 

      In our analysis, the PSC cells exhibit qualities reminiscent of the border cells, and thus we infer that the PSC constitutes a migratory cell collective.  We also show in Figure 1H that PSC cells exhibit cellular extensions, and thus have a very active, intrinsic actin-based cytoskeleton. In fact, in Figure 1I, we point out that PSC cells shift position within the collective, which is not only a direct feature of migration, but also occurs within the border cell collective as that collective migrates. Additionally, the fact that the lateral-most PSC cells shift position in the collective while remaining a part of the collective–and they do this while executing net directional movement–makes a strong argument that the PSC is migratory, as no cell types other than PSCs are contacting the surfaces of those shifting PSC cells. Lastly, the Reviewer’s supposition that, rather than migration, dorsal mesoderm structures form via “differential adhesion, on top of a more general adhesion of cells to each other” is, actually, precisely an inherent aspect of collective cell migration as summarized above for the ovarian border collective.

      In our resubmission we will adjust text citing the existing literature to better put into context the reasoning for why PSC formation based on our data is an example of collective cell migration.

      (3) That brings up the mechanistic centerpiece of this story, the Slit/Robo system. First: I suggest adding more detailed data from the study by Morin-Poulard et al 2016, in the Introduction, since these authors had already implicated Slit-Robo in PSC function and offered a concrete molecular mechanism: "vascular cells produce Slit that activates Robo receptors in the PSC. Robo activation controls proliferation and clustering of PSC cells by regulating Myc, and small GTPase and DE-cadherin activity, respectively". As stated in the Discussion: the mechanism of Slit/Robo action on the PSC in the embryo is likely different, since DE-cadherin is not expressed in the embryonic PSC; however, it maybe not be THAT different: it could also act on adhesion between PSC cells themselves and their neighbors. What are other adhesion proteins that appear in the late lateral mesodermal structures?

      Could DN-cadherin or Fasciclins be involved?

      We agree with the Reviewer that Slit-Robo signaling likely acts in part on the PSC by affecting PSC cell adhesion to each other and/or to CBs (lines 428-435). As stated in the Discussion, we do not observe Fasciclin III expression in the PSC until late stages when the PSC has already been positioned, suggesting that Fasciclin III is not an active player in PSC formation. Assessing whether the PSC expresses any other of the suite of potential cell adhesion molecules such as DN-Cadherin or other Fasciclins, and then study their potential involvement in the Slit-Robo pathway in PSC cells, would be part of a follow-up study.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to address several key issues and provide more explicit clarification when interpreting the behavior of the PSC cells as "migration." It is recommended that the authors engage with all reviewers' comments and refine the text based on the feedback they find valuable.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) Is it possible to assay robo1 and/or robo1 RNAi in a tissue-specific manner to further explore an intrinsic role in the PSC? Might the VM indirectly affect PSCs in a CB-independent manner? How does this affect the interpretation of results in Figure 4.

      See also our response to Reviewer #1, Public review weaknesses #2.

      Though we agree with the Reviewer that this is the better experiment to test for an intrinsic role for Robo in the PSC, this experiment is not possible at this time. As we noted in the manuscript, we do not yet have an embryonic PSC-specific GAL4, though we have been putting efforts towards identifying/developing such a tool. The Antp-GAL4 driver we used in this study will drive not only in both PSCs and CBs, but also in Vm, epidermis, and amnioserosa, as well as other tissues. The other available embryonic PSC drivers are not specific to the PSC and will drive expression in CBs and Vm, at minimum. This, combined with the reality that RNAi can be ineffective in embryonic tissues, resulted in our use of whole organism mutants to best address this question. 

      We acknowledge that it is possible the Vm indirectly effects the PSC in a CB-independent manner in the double Robo mutant, and we added a statement to the Discussion reiterating this point. However, because the PSC expresses Robo1 and Robo2, we maintain that the simplest interpretation of the results in Figure 4 is that PSC cells require intrinsic Robo signaling. And, as we state in the manuscript, it is possible that Slit signals directly from Vm to Robo on the PSC.

      (2) As this is the first study to be presenting PSC formation as involving collective cell migration, can the authors provide experimental evidence and rationale for this categorization?

      We have added our rationale to the Results section in the revision.

      See also our response to Reviewer #3, Public review weakness #2.

      (3) The Slit staining presented in Fig 3 W', Z' should be quantified. Furthermore, what is the VM phenotype when Robo1 is overexpressed? Is there a VM-specific phenotype and could this indirect effect cause the PSC to misform/mismigrate?

      We didn’t quantify Slit levels in the Vm-specific Robo overexpression condition because there was a visually striking difference compared to controls (increased intensity and specific localization to Vm membranes), and the manipulation resulted in a PSC phenotype. Thus, the evidence we show appears sufficient to strongly suggest that our genetic manipulation resulted in successful trapping of Slit on the Vm.

      As to a Vm phenotype when Robo1 is overexpressed Vm-specifically: we know Vm is present, but we haven’t performed an in-depth phenotypic analysis. In the manuscript we show that this manipulation at least affects organization of PSC-adjacent CBs, which we go on to show is correlated with mis-positioned PSCs. Thus, the PSC phenotype in this condition is not solely due to a Vm-specific phenotype.

      Minor concerns/suggestions:

      (1) I might have missed it but where are the Movies referenced in the text? Are legends provided for the videos? It is important that this is included in the final version (or more clearly presented if I missed it).

      We thank you the Reviewer for pointing this out; we now direct the reader to the movies at appropriate places within the text.

      (2) In Figure 5, it might be helpful to add a third column to A in which the PSCs are pseudo-colored and thus highlighted because it is difficult to discern the white (not pink) PSCs...

      We appreciate the suggestion and now include these panels as Figure 5A’’ in the revision.

      (3) If I am following correctly, the lost PSC cells in Figure 5 don't move. Doesn't this suggest that what is critical is that the PSCs attach to the VM and/or CBs, and not necessarily that they are an actively migrating cell type? They "move" but might be passively carried.

      See also the response to Reviewer #3, Public reviews weaknesses #2.

      The Reviewer is correct that the PSC cells in Fig. 5 don’t move very much, but we interpret this differently from the Reviewer. After detachment of the cells in question they undergo dramatic shape changes, indicating active cytoskeletal remodeling, so the molecular machinery needed for migration appears to remain intact. Thus, we suggest that this observation actually emphasizes our finding that collectivity is needed for the migration. Given the consistency of PSC coalescence/collectivity and the intricate regulation that controls it, we believe it to be an integral part of PSC identity. When PSC cells become detached, they likely lose an aspect of their identity. In various manipulations we’ve noted instances of severely dispersed PSC cells expressing very low levels of identity markers Antp or Odd. Cells in such cases are likely compromised for their function, and this can include, for example, whether they can properly sense cues for migration.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      (1) The expression pattern of Antp-Gal4 > myrGFP in the whole embryo should be shown to better demonstrate the overlap with Odd. How does it compare with Antp-Gal4 > CD8::GFP?

      We do not understand the question posed. We are not suggesting that Antp and Odd overlap in all cells, nor even many cells. It has been demonstrated by the field that co-expression among mesodermal cells, in the position where LG cells are specified, is a marker for the PSC. We have not thoroughly investigated all reporter lines for the GAL4 drivers used by the field.

      (2) Does Tincdelta4-Gal4 not at all express in the PSC? This should be verified.

      This question appears to refer to depletion of Slit by RNAi or cell killing driven by tinCΔ4-GAL4. TinCΔ4-GAL4 is expressed in CBs and in precisely 1 embryonic PSC cell. First, Slit isn’t expressed by any PSC cells to our eye, so any PSC mis-positioning observed upon tinCΔ4>Sli RNAi implicates CB involvement in PSC positioning. In designing tests for CB involvement, we were unable to identify any mutant known to lack CBs (or have fewer CBs) that didn’t also affect specification of the LG/PSC. The cell killing approach seemed best.  It is possible that, in this scenario, perhaps ablation of a single, key PSC cell could affect final positioning of the other PSCs, but we think that less likely than a role for CBs. We also retain our original conclusion due to the fact that we often find mis-positioned PSC cells adjacent to mis-positioned CBs, including in the panel representing the CB ablation experiment, Figure 2S.  

      (3) Line 212: The data provide evidence that Vm is necessary, but clearly not sufficient, as CBs are also necessary.

      We see how this wording was misleading and have adjusted the text accordingly.

      (4) The CBs are not visible in Figure 3B.

      We are unsure what the Reviewer is referring to, as we are certain that the signal between the blue outlines is indeed Slit expression in CBs.

      Reviewer #3 (Recommendations For The Authors):

      One minor mistake (I believe): in line 229 it should say "3C and 3D"

      We have corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Despite the breadth of information presented, however, further quantification of results and explanation of experimental approaches would be needed to support some of the authors' claims. Additionally, a more thorough discussion is needed to contextualize their findings relative to previous work.

      (1) a. Important quantification is lacking for the data presented. For example, multiple figures include immunohistochemistry or immunocytochemistry data (Figures 1, 5, 6), but they are presented without accompanying measures like fractions of cells labeled or comparisons against controls.

      We would like to clarify that the immunohistochemistry or immunocytochemistry data presented are meant to be qualitative rather than quantitative. The main purpose of the images is to show the presence or absence of markers of OEC subtypes rather than how much is present. That being said, in the revision we now add quantitative estimates of cell fractions for OECs along with other major cell types in Supplemental Table 1 and each OEC subtype marker in Supplemental Table 2. 

      b. As a result, for axons projecting via OEC bridges in Figure 1, it is unclear how common these bridges are in the presence or absence of OECs.

      We note that the number of spinal cord transected rats with bridges of axons crossing the lesion core are extremely rare following a severe spinal cord injury in adult mammals. Our first example of axon bridging following a complete spinal cord transection followed by OEC transplants was reported in Thornton et al., (2018) and compared to an incomplete transection in a fibroblast-transplanted control in his Figure 4. That figure also appeared the cover of Experimental Neurology when the paper was published. Figure 1 in the current paper was from an independent experiment which replicated the previously observed rare bridge formation. We noted this in the revised manuscript.

      Page 6: “We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals.”

      c. For Figure 6., it is unclear whether cells having an alternative OEC morphology coincide with progenitor OEC subtype marker genes to a statistically significant degree. (see top paragraph on page 11)

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: one type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our OEC subtypes based on scRNAseq with those previously described based on morphology alone (Franceschini and Barnett, 1996). There could be agreement between large, flat or small fusiform OECs morphological and their progenitor status, but it is not required that the two classification types would significantly overlap. Here we report the percentage of morphology-based cell subtypes that show expression of our OEC subtype markers to estimate the overlap between the two. Our results indicate the two types of OEC morphologies share a certain degree of overlap, a finding that indicates similarities as well as differences between the two classification methods.

      In our results section we show that ~3/4ths of the Ki67-expressing OEC progenitor cells sampled were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remaining ~1/4th of the Ki67-labeled  OECs were fusiform in shape and expressed Ngfr<sup>p75</sup> strongly. We feel that this is important to include as it is the only previous report of OB-OEC subtypes. The statistics of these results were in our original manuscript on page 11 and we further revise the text as follows:

      Page 12: “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti-Ngfr<sup>p75</sup>  (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4ths) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share a certain degree of overlap, indicating similarities but also differences between the two classification methods.”

      d. Similar quantification is missing in other types of data such as Western blot images (Fig. 9) and OEC marker gene data (for which p-values are not reported; Table S2). 

      Response on Western blots: The Western blot signals shown in Figure 9 are from experiments that were designed to be qualitative rather than quantitative, by addressing the question, “Can we detect Reelin signals or not? in the different samples.” Both Western blots show that Reln<sup>+/+</sup> mouse olfactory bulbs (d) or cortices (e) contain Reelin whereas Reln<sup>-/-</sup>  samples do not and therefore provide positive and negative controls, respectively. The rat olfactory nerve layer (ONL, laminae I-II of olfactory bulb, d lane 1; e lane 3) contains mainly OECs wrapped around the axons of the olfactory sensory neurons that transmit olfactory signals into the olfactory bulb. To address your request for quantification, Dr. Khankan measured the density of the three isoforms of Reelin, 400 kD, 300 kD and 180 kD in Fig. 9e and normalized them against the GADPH control (37 kD). The graph below shows the normalized band density in arbitrary units on the Y-axis relative to the first 3 conditions, i.e., Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cerebral cortices and rat  Reln<sup>+/+</sup> ONL. Because the conditioned medium was collected from tissue culture medium rather than cells or tissue, the GAPDH control was not present and therefore these data cannot be normalized in a similar analysis.  

      Author response image 1.

      Response for OEC marker gene data: We now add new full supplementary Table S1 (for major cell types) and Table S2 (for OEC subtypes) to report statistical p values and adjusted p values, as well as additional statistics information including percent cell expressing a subtype marker in a given subtype versus in other subtypes. 

      e. The addition of quantitative measures and, where appropriate, statistical comparisons with p-values or other significance measures, would be important for supporting the authors' claims and more rigorously conveying the results.

      As detailed in the above responses, we now add quantifications and statistics to support the claims and enhance the rigor of our analysis.

      (2) a. Some aspects of the experimental design that are relevant to the interpretation of the results are not explained. For example, OECs appear to be collected from only female rats, but the potential implications of this factor are not discussed.

      We added a short explanation in the Discussion and Methods section regarding why spinal cord injury studies are carried out on female rats.

      Page 24, Discussion: “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies prefer females as their short urethra facilitates daily manual bladder expression. Our study, therefore, was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation.”

      Page 26, Methods: “Only females were used in order to match the sex of previous SCI studies conducted exclusively on female rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018). Following complete thoracic spinal cord transection, an adult rat is unable to urinate voluntarily and therefore urine must be manually “expressed” twice a day throughout the experiment. Females have a shorter urethra than males, and thus their bladders are easier to empty completely.”

      b. Additionally, it is unclear from the manuscript to what degree immunopurified cells are OECs as opposed to other cell types. The antibody used to retain OECs, nerve growth factor receptor p75 (Ngfr-p75), can also be expressed by non-OEC olfactory bulb cell types including astrocytes [1-3]. The possible inclusion of Ngfr-p75-positive but non-OEC cell types in the OEC culture is not sufficiently addressed.

      (a) Cragnolini, A.B. et al., Glia, (2009), doi: 10.1002/glia.20857.

      (b) Vickland H. et al., Brain Res., (1991), doi: 10.1016/0006-8993(91)91659-O.

      (c) Ung K. et al., Nat Commun., (2021), doi: 10.1038/s41467-021-25444-3.

      Our OECs are dissected primarily from the olfactory nerve layer that is concentrated medially and ventrally around the olfactory bulb together with a small part of the glomerular layer (layer II). OECs are the only glia present in olfactory nerve layer. Thus, although it is possible that other cell types also express Ngfr-p75 as pointed out by the reviewer and in the references provided, our OEC dissection method severely limits the number of astrocytes that might be included in our cultures. We further provide additional evidence (see updated Figure 2d and the detailed responses to the next question) that our immunopanned OECs using our dissection method consistently express all classic OEC markers but do not consistently express the majority of classic markers for other glial cell types such as astrocytes or oligodendrocytes.

      Such non-OEC cell types are also not distinguished in the analysis of single-cell RNA sequencing data (only microglia, fibroblasts, and OECs are identified; Figure 2). Thus, it is currently unclear whether results related to the OEC subtype may have been impacted by these experimental factors.

      We need to clarify that when determining potential cell types in Figure 2, we compared our cell cluster marker genes against a broad array of cell types including astrocytes, oligodendrocytes and Schwann cells, but the gene overlap was only significant for microglia, fibroblasts, and OECs, which we labeled in new Figure 2d. We added more details in methods and results to clarify how we determined the cell types in Figure 2 (text added below). We did consider all the potential cell types that could have been present in our OEC cultures, including astrocytes. However, astrocyte or oligodendrocyte markers were not significantly enriched in the clusters, but markers for microglia, fibroblasts, and OECs were prominent in the cell clusters.

      In the revised Figure 2d, we now illustrate that the OEC clusters not only express typical OEC markers, but also express a few but not all marker genes from other glial cells. We show the comparative data on markers for astrocytes, oligodendrocytes, and Schwann cells in Figure 2d in parallel with the marker genes for OECs, microglia, and fibroblasts. For each of the other glial cell types, there are some genes which overlap with OECs, and that is the reason why we identified OECs as hybrid glia.

      Page 6, Results: “Based on previously reported cell type marker genes for fibroblasts and major glial cell types including OECs, astrocytes, oligodendrocytes, and microglia, we found elevated expression of OEC marker genes in clusters 2, 3 and 7, microglia marker genes in clusters 4, 6, and 7, and fibroblast marker genes in clusters 0, 1, and 5 (Figure 2d).”

      Page 33, Methods: “Additional marker genes for fibroblasts and multiple glial cell types including astrocytes, oligodendrocytes, and microglia were also used to compare with those of the cell clusters.”

      (3) The introduction, while well written, does not discuss studies showing no significant effect of OEC implantation after spinal cord injury. The discussion also fails to sufficiently acknowledge this variability in the efficacy of OEC implantation. This omission amplifies bias in the text, suggesting that OECs have significant effects that are not fully reflected in the literature. The introduction would need to be expanded to properly address the nuance suggested by the literature regarding the benefits of OECs after spinal cord injury. Additionally, in the discussion, relating the current study to previous work would help clarify how varying observations may relate to experimental or biological factors.

      We appreciate the insightful comment and have now included information about the variability in OEC transplantation in previous studies in both the introduction and discussion sections. We discuss technical differences that lead to variability in the Introduction and how our findings could help interpret the variability in the Discussion.

      Page 4-5: Text added to the Introduction: “The outcomes of OEC transplantation studies after spinal cord injury vary substantially in the literature due to many technical differences between their experimental designs. The source of OECs has a great impact on the outcome, with OB-OECs showing more promise than peripheral lamina propria-derived OECs, and purified, freshly-prepared OECs being required for optimal OEC survival. Other important variables include the severity of the injury (hemisection to complete spinal cord transection), the age of the spinal cord injured host (early postnatal versus adult), and OEC transplant strategies (delayed or acute transplantation, cell transplants with or without a matrix; Franssen et al., 2007). Franssen et al. (2007) evaluated studies that used only OECs as a transplant, and reported that 41 out of 56 studies showed positive effects, such as OEC stimulation of regeneration, positive interactions with the glial scar and remyelination of axons. More recent systematic reviews and meta-analyses on the effects of OEC transplantation following different spinal cord injury models reported that OECs significantly improved locomotor function (Watzlawick et al.2016; Nakjavan-Shahraki et al., 2018), but did not improve neuropathic pain (Nakjavan-Shahraki et al., 2018.)”

      Pages 24-25: Discussion on OEC source variability  “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between these OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and help resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      Reviewer #1 (Recommendations For The Authors):

      This is an extremely well-written and impactful series of experiments from a renowned leader in the field. The experimental questions are timely, with similar therapeutic approaches being prepared for clinical trial. The results address a gap that has persisted in the field for several decades and one that has been considered by many scientists long before technology existed to find answers. This highlights the importance of these experiments and the results reported here. With these things in mind, there are only a few minor factors that I have, that should be addressed to strengthen the paper.

      We truly appreciate the positive evaluations from the reviewer!

      Primary concerns

      (1) Quantification of results: The authors report on the data with broad brush strokes, missing the opportunity to quantify results and strengthen the interpretations. For instance, when describing gene expression, what proportion of cells analyzed were expressing these genes? How did this compare with detectable levels of protein? Can the author draw correlations between data sets collected that could offer even more insight into the identities of the cells studied? There is also a missed opportunity to evaluate how transplantation into injured neural tissue might alter gene expression of the phenotypes identified prior to transplantation.

      We appreciate these insightful comments and have added quantitative information and other relevant discussions in the revision. We now add Suppl Tables 1 (for major cell types including OECs, fibroblast, and microglia) and 2 (for OEC subtypes) to indicate the proportion of cells expressing each marker gene in each given cell cluster/subtype in the column. “Percentage of cells expressing the gene in the subtype/cell type” versus the proportion of cells expression the given marker genes in other cell types in the column “Percentage of cells expressing the gene in the other subtypes/cell types.” In the new supplementary tables, we report statistical p values and adjusted p values after multiple testing correction to indicate statistical significance.

      Regarding the comparison with protein levels, we carried out immunohistochemistry experiments to confirm the proteins corresponding to OEC subtype markers. Our findings show that proteins for the gene markers can be detected, and thereby supports our sc-seq findings. However, the immunofluorescence only provides a qualitative measure of protein levels in situ, so we cannot perform a correlation analysis. This is something we plan to  pursue in a follow-up study with measurable protein levels. We also discuss future directions to examine the genes and proteins in in vivo transplantation studies in the Discussion.

      (2) Discussion and interpretation: Greater depth to interpretation and discussion of data and its impact on future work is needed. For example, on pages 20-21, the authors reflect briefly on why Reelin might be of interest (it could lead to Dab-1 expression), but why is that important? There are several instances like this where it would be useful for the authors to provide a little more insight into the potential impact of these data and interpretations.

      We appreciate these valuable suggestions. We have revised our Results and Discussion sections to offer deeper insight and interpretation of the importance of the data, especially that for Reelin.

      Page 17: Results: “In the canonical Reelin-signaling pathway, Reelin binds to the very-low-density lipoprotein receptor (Vldlr) and apolipoprotein E receptor 2 (ApoER2) and induces Src-mediated tyrosine phosphorylation of the intracellular adaptor protein Disabled-1 (Dab1). Both Reelin and Dab1 are highly expressed in embryos and contribute to correct neuronal positioning.”

      Page 22-23, Discussion: “Reelin is a developmentally expressed protein detected in specific neurons, in addition to OECs and Schwann cells. The canonical Reelin-signaling pathway involves neuronal-secreted Reelin binding to Vldlr and ApoER2 receptors expressed on Dab1-labeled neurons. Following Reelin binding, Dab1 is phosphorylated by Src family kinases which initiates multiple downstream pathways. Very little is known, however, about Reelin secreted by glia. Panteri et al. (2006) reported that Schwann cells express low levels of Reelin in adults, and that it is upregulated following a peripheral nerve crush, as is reported above for many neurotrophic factors. Reelin loss in Schwann cells reduced the diameter of small myelinated axons but did not affect unmyelinated axons (Panteri et al., 2005). In the olfactory system, OECs ensheath the Dab1-labeled, unmyelinated axons of olfactory sensory neurons which are continuously generated and die throughout life. OEC transplantation following spinal cord injury would provide an exogenous source of Reelin that could phosphorylate Dab1-containing neurons or their axons. Dab1 is expressed at high levels in the axons of some projection neurons, such as the corticospinal pathway (Abadesco et al., 2014). Future experiments are needed to explore the function that glial-secreted Reelin may have on axonal regeneration.”

      Minor concerns

      (3) The authors reflect on the spontaneous glial bridge that develops in the repairing spinal cord of Zebrafish, but perhaps even more relevant is that this same phenomenon occurs in mammals as well if the spinal cord is injured during early development (opossum; Lane et al, EJN 2007). This should be considered and the statement that there is little regeneration in the mammalian spinal cord should be clarified.

      We appreciate this insightful comment. We now add discussions of the axonal regeneration and bridging observed following severe spinal cord injury in young developing mouse and opossum spinal cords.

      Page 23: “Adult mammals show little evidence of spontaneous axonal regeneration after a severe spinal cord injury in contrast to transected neonatal rats (Bregman, 1987; Bregman et al., 1993) and young postnatal opossums (Lane et al., 2007). In immature mammals, axons continue to project across or bridge the spinal cord transection site during development. Lower organisms such as fish, show even more evidence of regeneration following severe SCI. Mokalled et al. (2016) reported that glial secretion of Ctgfa/Ccn2 was both necessary and sufficient to stimulate a glial bridge for axon regeneration across the zebrafish transection site. Cells in the injury site that express Ctgf include ependymal cells, endothelial cells, and reactive astrocytes (Conrad et al., 2005; Mokalled et al., 2016; Schwab et al., 2001). Here we show that, although rare, Ctgf-positive OECs can contribute to glial bridge formation in adult rats. The most consistent finding among our severe SCI studies combined with OEC transplantation is the extent of remodeling of the injury site and axons growing into the inhibitory lesion site, together with OECs and astrocytes. The formation of a glial bridge across the injury was critical to the spontaneous axon generation seen in zebrafish (Mokalled et al., 2016) and likely contributed to the axon regeneration detected in our OEC transplanted, transected rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018).

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript title and abstract must include the species and sex studied.

      The title and abstract have been modified as suggested.

      Page 1: “Olfactory ensheathing cells from adult female rats are hybrid glia that promote neural repair”

      (2) OECs submitted for sequencing were like those about to be transplanted; however, the phenotype of the cells would likely change immediately and shift over time post-implantation. Please briefly address or discuss this point in the Discussion (or Results).

      We have added this important discussion point.

      Pages 23-24: Discussion: “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      (3) Page 12: Use of "monocytes" - the word "monocyte" implies a circulating, undifferentiated innate immune cell. This should not be used interchangeably with macrophage or microglia.

      We agree and now refer to microglia or macrophages depending on the context. We did leave the term monocyte in Table 3 if these cells were found in a top 20 gene reported in the references.

      (4) Page 12: "We now show that these unique monocytes reported between the bundles of olfactory axons surrounded by OECs (Smithson & Kawaja, 2010), are in fact, a distinct subtype of OECs."

      Is it possible to conclude that these cells are a "distinct subtype of OECs?" Perhaps these cells are a hybrid between microglia/macrophages and OECs? This is speculative, so should be worded more carefully - especially in the Results section. Please clarify, dampen conclusions, and/or better justify the wording here.

      We agree and have modified the entire paragraph to dampen and more carefully explain our conclusions. We also added an additional observation that strengthens the relationship between OECs and microglial/macrophages.  

      Page 12, Results: Additional observation: “In fact, all top 20 genes in cluster 3 are expressed in microglia, macrophages, and/or monocytes (Suppl. Table 3).”

      Page 13, Results: The statement referenced in your review was deleted and we wrote the following: “Smithson and Kawaja (2010) identified unique microglial/macrophages that immunolabeled with Iba-1 (Aif1) and Annexin A3 (Anxa3) in the olfactory nerve and outer nerve layer of the olfactory bulb. These authors proposed that Iba1-Anxa3 double-labeled cells were a distinct population of microglia/macrophages that protected the olfactory system against viral invasion into the cranial cavity. Based on our scRNA-seq data we offer an alternative interpretation that at least some of these Iba-1-Anxa3 cells may be a hybrid OEC-microglial cell type. Supporting this interpretation, there are a number of reports that suggest OECs frequently function as phagocytes (e.g., Khankan et al., 2016; Nazareth et al., 2020; Su et al. 2013).”

      (5) Page 13: "Pseudotime trajectory analysis, a widely used approach to predict cell plasticity and lineages based on scRNA-seq data, suggests that there are potential transitions between specific OEC subclusters." This is interesting but is somewhat unclear. Please add one more sentence to aid the reader's understanding regarding how this analysis is performed.

      Thank you for your valuable feedback. We have revised the text for clarity as follows:

      Page 14, Results: “We performed pseudotime trajectory analysis using the Slingshot algorithm to infer lineage trajectories, cell plasticity and lineages by ordering cells in pseudotime based on their transcriptional progression reflected in scRNA-seq data. Transcriptional progression refers to the changes in gene expression profiles of cells as they undergo differentiation or transition through different states. The trajectory analysis results suggest that there are potential transitions between specific OEC subclusters.”

      (6) The authors could discuss potential reasons for variability in OEC treatment results after spinal cord injury between studies and labs. How might sequencing results here inform the debate about whether OECs are helpful or not?

      In response to the Public Review, we added discussions about the variability in OEC treatments between studies in both the Introduction and Discussion, and these comments are copied on pages 6-7 of this document. In the Discussion we included a statement about how the current findings may inform the debate on OECs.

      (7) Discussion: please add a discussion of limitations and future directions that addresses the following points:

      a) Please add one sentence on the lack of studying sex differences - only females were studied here.

      b) There is no correlation or modulation of any target genes, so all results here are correlative.

      c) Please add a brief paragraph with future directions for the field, including acknowledgment that the role of OECs in repair after SCI is not fully resolved and that future studies might consider targeting some of the specific pathways described herein.

      d) Which pathways and OEC subpopulations likely best support repair, and how might these be reinforced or better maintained in the SCI environment? If not known, what are the next steps for identifying the most reparative OEC subtype?

      Thank you for the valuable suggestions. We have added these to the discussion as detailed below.

      Pages 23-25, Discussion:

      “Limitations of these OEC scRNA-Seq studies”

      “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies are conducted on females as their short urethra facilitates daily manual bladder expression. Our study was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation. We also did not modulate any of the genes or proteins in the identified OEC subtypes to test their causal and functional roles, thus our findings remain correlative in the current study. Future gene/protein modulation studies are necessary to understand the functional roles of the individual OEC subtypes in the context of their reparative functions to determine which pathways and subtypes are more critical and can be enhanced for neural repair. Our current findings build the foundation for these future studies to help resolve the role of OECs in spinal cord injury repair.” 

      “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between the two OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      (8) Figure 6: What is the major point of this figure and its related immunocytochemistry? Please clarify.

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: One type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our scRNA-seq-based OEC subtypes with those previously described based on morphology alone (Franceschini and Barnett, 1996). In our results section we show that ~3/4ths of the OECs sampled that were Ki67+ progenitor cells and were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remainder were Schwann cell-like, fusiform in shape and strongly Ngfr<sup>p75</sup>-labeled. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the different classification methods.

      (9) Figure 9, caption: "OEC whole cell lysates (WCL; lanes: 4, 6, and 8), and OEC conditioned medium (CM; lanes: 5 and 7)."  This statement is unclear - please clarify the result here.

      We added clarification to the legend for Figure 9d. 

      Page 50: (d) “Western blot confirms the expression of Reelin in rat olfactory nerve layer I and layer II (ONL; lane 1 of western blot). Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse olfactory bulbs were used as positive and negative controls, respectively (lanes: 2 and 3). Reelin that was synthesized by cultured OECs was found in whole cell lysates (WCL; lanes: 4, 6, and 8), whereas Reelin that was secreted by cultured OECs into tissue culture medium was measured in the OEC “conditioned medium” (CM; lanes: 5 and 7). GAPDH was the loading control for tissue homogenates (lanes 1-4, 6, 8).”

      (10) Methods: A Cat. No. for all antibodies and key supplies should be included.

      Response: All of the antibody information in the revised version is in Suppl. Table 4. Information for other key supplies is included in the extensive methods section.

      (11) Methods: How was primary antibody specificity validated for less-used antibodies? Background staining can be a major issue after SCI; e.g., with the CTGF antibody used in Figure 5.

      The spinal cord section shown in Figure 5 was compared to sections from the same SCI cohort that had been injected with control cells, i.e. skin fibroblasts. We have used the first two antibodies (anti-Glial fibrillary acidic protein and anti-Green fluorescent protein) for many years so only the CTGF was a “less-used antibody.” Our strategy for working with “less-used” or “newly-purchased” antibodies was as follows.

      First, we studied the literature to find the best antibodies for neuronal tissue. Many of the images in Figure 7 were generated with antibodies purchased just for this study. Our goal was to characterize them on normal adult lamina propria and olfactory bulb tissues rather than in the injured spinal cord where background can be an issue. In the olfactory bulb we examined the olfactory nerve layer where OECs are concentrated and then examined the olfactory epithelium, lamina propria, and the deep layers of the olfactory bulb to find regions without immunolabel. As described above, we tested anti-CTGF antibodies in SCI sections implanted with skin fibroblasts controls when conducting experiments for CTGF in sections with OECs. New antibodies were tested at multiple concentrations and we tried different immunocytochemical techniques. Anti-CTFG is expressed by several different cell types, but expression is low in most of the areas above and below the injury site. Despite our success with many “newly-purchased” antibodies there were at least 4 of them that we were never able obtain specific labeling. 

      (12) Will the data (especially the sequencing data) be shared publicly?

      The data has been uploaded to and shared via the public data repository GEO. Data availability is stated on the title page of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review): 

      (1) Although the theory is based on memory, it also is based on spatially-selective cells.

      Not all cells in the hippocampus fulfill the criteria of place/HD/border/grid cells, and place a role in memory. E.g., Tonegawa, Buszaki labs' work does not focus on only those cells, and there are certainly a lot of non-pure spatial cells in monkeys (Martinez-Trujillo) and humans (iEEG). Does the author mainly focus on saying that "spatial cells" are memory, but do not account for non-spatial memory cells? This seems to be an incomplete account of memory - which is fine, but the way the model is set up suggests that *all* memory is, place (what/where), and non-spatial attributes ("grid") - but cells that don't fulfil these criteria in MTL (Diehl et al., 2017, Neuron; non-grid cells; Schaeffer et al., 2022, ICML; Luo et al., 2024, bioRxiv) certainly contribute to memory, and even navigation. This is also related to the question of whether these cell definitions matter at all (Luo et al., 2024). The authors note "However, this memory conjunction view of the MTL must be reconciled with the rodent electrophysiology finding that most cells in MTL appear to have receptive fields related to some aspect of spatial navigation (Boccara et al., 2010; Grieves & Jeffery, 2017). The paucity of non-spatial cells in MTL could be explained if grid cells have been mischaracterized as spatial." Is the author mainly talking about rodent work?

      There is a new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      (2) Related to the last point, how about non-grid multi-field mEC cells? In theory, these also should be the same; but the author only presents perfect-look grid cells. In empirical work, clearly, this is not the case, and many mEC cells are multi-field non-grid cells (Diehl et al., 2017). Does the model find these cells? Do they play a different role? As noted by the author "Because the non-spatial attributes are constant throughout the two-dimensional surface, this results in an array of discrete memory locations that are approximately hexagonal (as explained in the Model Methods, an "online" memory consolidation process employing pattern separation rapidly turns an approximately hexagonal array into one that is precisely hexagonal). " If they are indeed all precisely hexagonal, does that mean the model doesn't have non-grid spatial cells? 

      Grid cells with irregular firing fields are now considered in the discussion with the following paragraphs

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multilocation grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face-centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is real-world variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three real-world dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      (3) Theoretical reasons for why the model is put together this way, and why grid cells must be coding a non-spatial attribute: Is this account more data-driven (fits the data so formulated this way), or is it theoretical - there is a reason why place, border, grid cells are formulated to be like this. For example, is it an efficient way to code these variables? It can be both, like how the BVC model makes theoretical sense that you can use boundaries to determine a specific location (and so place cell), but also works (creates realistic place cells). 

      The motivation for this model is now articulated in the new section, quoted above, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ Regarding the assumption that border cells provide a spatial metric, this assumption is made for the same reasons as in the BVC model. Regarding this, the text said: “These assumptions regarding border cells are based on the boundary vector cell (BVC) model of Barry et al. (2006). As in the BVC model, combinations of border cells encode where each memory occurred in the realworld X/Y plane.”. A new sentence is added to model methods, stating: “This assumption is made because border cells provide an efficient representation of Euclidean space (e.g., if the animal knows how far it is from different walls of the enclosure, this already available information can be used to calculate location).”

      But in this case, the purpose of grid cell coding a non-spatial attribute, and having some kind of system where it doesn't fire at all locations seems a little arbitrary. If it's not encoding a spatial attribute, it doesn't have to have a spatial field. For example, it could fire in the whole arena - which some cells do (and don't pass the criteria of spatial cells as they are not spatially "selective" to another location, related to above).  

      Some cells have a constant high firing rate, but they are the exception rather than the rule. More typically, cells habituate in the presence of ongoing excitatory drive and by doing so become sensitive to fluctuations in excitatory drive. Habituation is advantageous both in terms of metabolic cost and in terms of function (i.e., sensitivity to change). This is now explained in the following paragraph:

      “In theory, a cell representing a non-spatial attribute found at all locations of an enclosure (aka, a grid cell in the context of this model), could fire constantly within the enclosure. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations rather than in all locations even though the attribute is found at all locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive. In the model simulation, this dynamic adaptation is captured by supposing that cells fire 5% of the time on-average across the simulation, regardless of their excitatory inputs.”

      (4) Why are grid cells given such a large role for encoding non-spatial attributes? If anything, shouldn't it be lateral EC or perirhinal cortex? Of course, they both could, but there is less reason to think this, at least for rodent mEC.  

      This is a good point and the following paragraph has been added to the introduction to explain that lateral EC is likely part of the explanation. But even when including lateral EC, it still appears that most of the input to hippocampus is spatial.

      “One possible answer to the apparent lack of non-spatial cells in MTL is to highlight the role of the lateral entorhinal cortex (LEC) as the source of non-spatial what information for memory encoding (Deshmukh & Knierim, 2011). LEC can be contrasted with mEC, which appears to only provide where information (Boccara et al., 2010a; Diehl et al., 2017). Although it is generally true that LEC is involved in non-spatial processing, there is evidence that LEC provides some forms of spatial information (Knierim et al., 2014). The kind of non-spatial information provided by LEC appears to be in relation to objects (Connor & Knierim, 2017; Wilson et al., 2013). However, in a typical rodent spatial navigation study there are no objects within the enclosure. Thus, although the distinction between mEC and LEC is likely part of the explanation, it is still the case that rodent entorhinal input to hippocampus appears to heavily favor spatial information.”

      (5) Clarification: why do place cells and grid cells differ in terms of stability in the model? Place cells are not stable initially but grid cells come out immediately. They seem directly connected so a bit unclear why; especially if place cell feedback leads to grid cell fields. There is an explanation in the text - based on grid cells coding the on-average memories, but these should be based on place cell inputs as well. So how is it that place fields are unstable then grid fields do not move at all? I wonder if a set of images or videos (gifs) showing the differences in spatial learning would be nice and clarify this point.  

      In this revision, I provide a new video focused on learning of place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. The short answer is that the grid fields for the non-spatial cell are based on the average across several view-dependent memories (i.e., across several place cells that have head direction sensitivity) and the average is reliable even if the place cells are unstable. The text of this explanation now reads:

      “Why was the grid immediately apparent for the non-spatial attribute cell whereas the grid took considerable prior experience for the head direction cells? The answer relates to memory consolidation and the shifting nature of the hippocampal place cells. Head direction cells only produced a reliable grid once the hippocampal place cells (aka, memory cells) assumed stable locations. During the first few sessions, the hippocampal place cells were shifting their positions owing to pattern separation and consolidation. But once the place cells stabilized, they provided reliable top-down memory feedback to the head direction cells in some places but not others, thus producing a reliable grid arrangement to the firing maps of the head direction cells. In other words, for the head direction cells, the grid only appeared once the place cells stabilized. This slow stabilization of place fields is a known property (Bostock et al., 1991; Frank et al., 2004).

      In the simulation, the place cells did not stabilize until a sufficient number of place cells were created (Figure 9C). Specifically, these additional memories were located immediately outside the enclosure, around all borders (Figure 9D). These “outside the box” memories served to constrain the interior place cells, locking them in position despite ongoing consolidation. This dynamic can be seen in a movie showing a representative simulation. The movie shows the positions of the head direction sensitive place cells during initial learning, and then during additional sessions of prior experience as the movie speeds up (see link in Figure 9 capture).

      Why did the non-spatial grid cell (k) produce a grid immediately, before the place cells stabilized? As discussed in relation to Figure 8, the non-spatial grid cell is the projection through the 3D volume of real-world coordinates that includes X, Y, and head direction. Each grid field of a non-spatial grid cell reflects feedback from several place cells that each have a different head direction sensitivity (see for instance the allocentric pairs of memories illustrated in Figure 8C and 8D). Thus, each grid field is the average across several memories that entail different viewpoints and this averaging across memories provides stability even if the individual memories are not yet stable. This average of unstable memories produces a blurry sort of grid pattern without any prior experience.

      A final piece of the puzzle relies on the same mechanism that caused the grid pattern to align with the borders as reported in the results of Figures 6 and 7. Specifically, there are some “sticky” locations with ongoing consolidation because the connection weights are bounded. Because weights cannot go below their minimum or above their maximum, it is slightly more difficult for consolidation to push or pull connection weights over the peak value or under the minimum value of the tuning curve. Thus, the place cells tend to linger in locations that correspond to the peak or trough of a border cell. There are multiple peak and trough locations but for the parameter values in this simulation, the grid pattern seen in Figure 9C shows the set of peak/trough locations that satisfy the desired spacing between memories. Thus, the average across memories shows a reliable grid field at these locations even though the memories are unstable.”

      (6) Other predictions. Clearly, the model makes many interesting (and quite specific!) predictions. But does it make some known simple predictions? 

      • More place cells at rewarded (or more visited) locations. Some empirical researchers seem to think this is not as obvious as it seems (e.g., Duvellle et al., 2019; JoN; Nyberg et al., 2021, Neuron Review).  

      • Grid cell field moves toward reward (Butler et al., 2019; Boccera et al., 2019).  

      • Grid cells deform in trapezoid (Krupic et al., 2015) and change in environments like mazes (Derikman et al., 2014).  

      Thank you for these suggestions and I have added the following paragraph to the discussion:

      “In terms of the animal’s internal state, all locations in the enclosure may be viewed as equally aversive and unrewarding, which is a memorable characteristic of the enclosure. Reward, or lack thereof, is arguably one of the most important nonspatial characteristics and application of this model to reward might explain the existence of goal-related activity in place cells (Hok et al., 2007; although see Duvelle et al., 2019), reflecting the need to remember rewarding locations for goal directed behavior. Furthermore, if place cell memories for a rewarding location activate entorhinal grid cells, this may explain the finding that grid cells remap in an enclosure with a rewarded location such that firing fields are attracted to that location (Boccara et al., 2019; Butler et al., 2019). Studies that introduce reward into the enclosure are an important first step in terms of examining what happens to grid cells when the animal is placed in a more varied environment.”

      Regarding the changes in shape of the environment, this was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).” This particular section of the paper now appears in the Appendix and Figure 12 is now Appendix Figure 2.

      Reviewer #2 (Public Review): 

      The manuscript describes a new framework for thinking about the place and grid cell system in the hippocampus and entorhinal cortex in which these cells are fundamentally involved in supporting non-spatial information coding. If this framework were shown to be correct, it could have high impact because it would suggest a completely new way of thinking about the mammalian memory system in which this system is non-spatial. Although this idea is intriguing and thought-provoking, a very significant caveat is that the paper does not provide evidence that specifically supports its framework and rules out the alternate interpretations. Thus, although the work provides interesting new ideas, it leaves the reader with more questions than answers because it does not rule out any earlier ideas. 

      Basically, the strongest claim in the paper, that grid cells are inherently non-spatial, cannot be specifically evaluated versus existing frameworks on the basis of the evidence that is shown here. If, for example, the author had provided behavioral experiments showing that human memory encoding/retrieval performance shifts in relation to the predictions of the model following changes in the environment, it would have been potentially exciting because it could potentially support the author's reconceptualization of this system. But in its current form, the paper merely shows that a new type of model is capable of explaining the existing findings. There is not adequate data or results to show that the new model is a significantly better fit to the data compared to earlier models, which limits the impact of the work. In fact, there are some key data points in which the earlier models seem to better fit the data.  

      Overall, I would be more convinced that the findings from the paper are impactful if the author showed specific animal memory behavioral results that were only supported by their memory model but not by a purely spatial model. Perhaps the author could run new experiments to show that there are specific patterns of human or animal behavior that are only explained by their memory model and not by earlier models. But in its current form, I cannot rule out the existing frameworks and I believe some of the claims in this regard are overstated. 

      As previously detailed in Box 1 and as explained in the text in several places, the model provides an explanation of several findings that remain unexplained by other theories (see “Results Uniquely Explained by the Memory Model”). But more generally this is a good point, and the initial draft failed to fully articulate why a researcher might choose this model to guide future empirical investigations. A new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      - The paper does not fully take into account all the findings regarding grid cells, some of which very clearly show spatial processing in this system. For example, findings on grid-bydirection cells (e.g., Sargolini et al. 2006) would seem to suggest that the entorhinal grid system is very specifically spatial and related to path integration. Why would grid-bydirection cells be present and intertwined with grid cells in the author's memory-related reconceptualization? It seems to me that the existence of grid-by-direction cells is strong evidence that at least part of this network is specifically spatial.

      Head by direction grid cells were a key part of the reported results. These grid cells naturally arise in the model as the animal forms memories (aka, hippocampal place cells) that conjoin location (as defined by border cells), head direction at the time of memory formation, and one or more non-spatial properties found at that location. In this revision, I have attempted to better explain how including head direction in hippocampal memories naturally gives rise to these cell types. The introduction to the head direction module simulations now reads:

      “According to this memory model of spatial navigation, place cells are the conjunction of location, as defined by border cells, and one or more properties that are remembered to exist at that location. Such memories could, for instance, allow an animal to remember the location of a food cache (Payne et al., 2021). The next set of simulations investigates behavior of the model when one of the to-be-remembered properties is head direction at the time when the memory was formed (e.g., the direction of a pathway leading to a food cache). Indicating that head direction is an important part of place cell representations, early work on place cells in mazes found strong sensitivity to head direction, such that the place field is found in one direction of travel but not the other (McNaughton et al., 1983; Muller et al., 1994). Place cells can exhibit a less extreme version of head direction sensitivity in open field recordings (Rubin et al., 2014), but the nature of the sensitivity is more complicated, depending on location of the animal relative to the place field center (Jercog et al., 2019).

      It is possible that some place cell memories do not receive head direction input, as was the case for the simulations reported in Figures 6/7 – in those simulations, place cells were entirely insensitive to head direction, owing to a lack of input from head direction cells. However, removal of head direction input to hippocampus affects place cell responses (Calton et al., 2003) and grid cell responses (Winter et al., 2015), suggesting that head direction is a key component of the circuit. Furthermore, if place cells represent episodic memories, it seems natural that they should include head direction (i.e., viewpoint at the time of memory formation).

      In the simulations reported next, head direction is simply another property that is conjoined in a hippocampal place cell memory. In this case, a head direction cell should become a head direction conjunctive grid cell (i.e., a grid cell, but only when the animal is heading in a particular direction), owing to memory feedback from the hexagonal array of hippocampal place cell memories. When including head direction, the real-world dimensions of variation are across three dimensions (X, Y, and head direction) rather than two, and consolidation will cause the place cells to arrange in a three-dimensional volume. The simulation reported below demonstrates that this situation provides a “grid module”.”

      - I am also concerned that the paper does not do enough to address findings regarding how the elliptical shape of grid fields shifts when boundaries of an environment compress in one direction or change shape/angles (Lever et al., & Krupic et al). Those studies show compression in grid fields based on boundary position, and I don't see how the authors' model would explain these findings.  

      This finding was covered in the original submission: “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.”

      I apologize for being unclear in describing how the model might explain this result. The paragraph has been rewritten and now reads:

      “Consider the possibility that one mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders (e.g., learning the properties of the enclosure, such as the metal surface) while a different grid module is based on head direction in remembered positions relative to landmarks exterior to the enclosure (e.g., learning the properties of the experimental room, such as the sound of electronics that the animal is subject to at all locations). This might explain why a deformation of the enclosure (moving one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, suppose that the movement of one wall is modest and after moving the wall, the animal views the enclosure as being the same enclosure, albeit slightly modified (e.g., when a home is partially renovated, it is still considered the same home). In this case, the set of non-orthogonal dimensions associated with enclosure borders would still be associated with the now-changed borders and any memories in reference to this border-determined space would adjust their positions accordingly in real-world coordinates (i.e., the place cells would subtly shift their positions owing to this deformation of the borders, producing a corresponding deformation of the grid). At the same time, there may be other sets of memories that are in relation to dimensions exterior to the enclosure. Because these exterior properties are unchanged, any place cells and grid cells associated with the exterior-oriented memories would be unchanged by moving the enclosure wall.”

      - Are findings regarding speed modulation of grid cells problematic for the paper's memory results? 

      - A further issue is that the paper does not seem to adequately address developmental findings related to the timecourses of the emergence of different cell types. In their simulation, researchers demonstrate the immediate emergence of grid fields in a novel environment, while noting that the stabilization of place cell positions takes time. However, these simulation findings contradict previous empirical developmental studies (Langston et al., 2010). Those studies showed that head direction cells show the earliest development of spatial response, followed by the appearance of place cells at a similar developmental stage. In contrast, grid cells emerge later in this developmental sequence. The gradual improvement in spatial stability in firing patterns likely plays a crucial role in the developmental trajectory of grid cells. Contrary to the model simulation, grid cells emerge later than place cells and head direction cells, yet they also hold significance in spatial mapping. 

      - The model simulations suggest that certain grid patterns are acquired more gradually than others. For instance, egocentric grid cells require the stabilization of place cell memories amidst ongoing consolidation, while allocentric grid cells tend to reflect average place field positions. However, these findings seemingly conflict with empirical studies, particularly those on the conjunctive representation of distance and direction in the earliest grid cells. Previous studies show no significant differences were found in grid cells and grid cells with directional correlates across these age groups, relative to adults (Wills et al., 2012). This indicates that the combined representation of distance and direction in single mEC cells is present from the earliest ages at which grid cells emerge. 

      These are good points and they have been addressed in a new section of the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      Reviewer #3 (Public Review): 

      A crucial assumption of the model is that the content of experience must be constant in space. It's difficult to imagine a real-world example that satisfies this assumption. Odors and sounds are used as examples. While they are often more spatially diffuse than an objects on the ground, odors and sounds have sources that are readily detectable. Animals can easily navigate to a food source or to a vocalizing conspecific. This assumption is especially problematic because it predicts that all grid cells should become silent when their preferred non-spatial attribute (e.g. a specific odor) is missing. I'm not aware of any experimental data showing that grid cells become silent. On the contrary, grid cells are known to remain active across all contexts that have been tested, including across sleep/wake states. Unlike place cells, grid cells do not seem to turn off. Since grid cells are active in all contexts, their preferred attribute must also be present in all contexts, and therefore they would not convey any information about the specific content of an experience.  

      These are good points and in this revision I have attempted to explain that there is a great deal of contextual similarity across all recording sessions. One paragraph in the discussion now reads

      “In a typical rodent spatial navigation study, the non-spatial attributes are wellcontrolled, existing at all locations regardless of the enclosure used during testing (hence, a grid cell in one enclosure will be a grid cell in a different enclosure). Because labs adopt standard procedures, the surfaces, odors (e.g., from cleaning), external lighting, time of day, human handler, electronic apparatus, hunger/thirst state, etc. might be the same for all recording sessions. Additionally, the animal is not allowed to interact with other animals during recording and this isolation may be an unusual and highly salient property of all recording sessions. Notably, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus and entorhinal input to hippocampus. According to this model, hippocampal place cells are “marking” all locations in the enclosure as places where these things tend to happen.”

      The proposed novelty of this theory is that other models all assume that grid cells encode space. This isn't quite true of models based on continuous attractor networks, the discussion of which is notably absent. More specifically, these models focus on the importance of intrinsic dynamics within the entorhinal cortex in generating the grid pattern. While this firing pattern is aligned to space during navigation and therefore can be used as a representation of that space, the neural dynamics are preserved even during sleep. Similarly, it is because the grid pattern does not strictly encode physical space that gridlike signals are also observed in relation to other two-dimensional continuous variables. 

      These models were briefly discussed in the general discussion section and in this revision they are further discussed in the introduction in a new section, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial two dimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      The use of border cells or boundary vector cells as the main (or only) source of spatial information in the hippocampus is not well supported by experimental data. Border cells in the entorhinal cortex are not active in the center of an environment. Boundary-vector cells can fire farther away from the walls but are not found in the entorhinal cortex. They are located in the subiculum, a major output of the hippocampus. While the entorhinalhippocampal circuit is a loop, the route from boundary-vector cells to place cells is much less clear than from grid cells. Moreover, both border cells and boundary-vector cells (which are conflated in this paper) comprise a small population of neurons compared to grid cells.

      AUTHOR RESPONSE: The model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Specific questions/clarifications:  

      (1) Assumption of population-based vs single unit link to biological cells: At the start, the author assumes that each unit here can be associated with a population: "the simulated activation values can be thought of as proportional to the average firing rate of an ensemble of neurons with similar inputs and outputs (O'Reilly & Munakata, 2000)." But is a 'grid cell' found here a single cell or an average of many cells? Does this mean the model assumes many cells that have different fields that are averaged, which become a grid-like unit in the model? But in biology, these are single cells? Or does it mean a grid response is an average of the place cell inputs? 

      I apologize for being unclear about this. The grid cells in the model are equivalent to real single cells except that the simulation uses a ratecoded cell rather than a spiking cell. The averaging that was mentioned in the paper is across identically behaving spiking cells rather than across cells with different grid field arrangements. To better explain this, I have added the following text:

      “For instance, consider a set of several thousand spiking grid cells that are identical in terms of their firing fields. At any moment, some of these identically-behaving cells will produce an action potential while others do not (i.e., the cells are not perfectly synchronized), but a snapshot of their behavior can be extracted by calculating average firing rate across the ensemble. The simulated cells in the model represent this average firing rate of identically-behaving ensembles of spiking neurons.” 

      This is a mathematical short-cut to avoid simulating many spiking neurons. Because this model was compared to real spike rate maps, this real-valued average firing rate is down-sampled to produce spikes by finding the locations that produced the top 5% of real-valued activation values across the simulation.

      (2) It is not clear to me why they are circular border cells/basis sets.  

      In the initial submission, there was a brief paragraph describing this assumption. In this revision, that paragraph has been expanded and modified for greater clarity. It now reads:

      “Because head direction is necessarily a circular dimension, it was assumed that all dimensions are circular (a circular dimension is approximately linear for nearby locations). This assumption of circular dimensions was made to keep the model relatively simple, making it easier to combine dimensions and allowing application of the same processes for all dimensions. For instance, the model requires a weight normalization process to ensure that the pattern of weights for each dimension corresponds to a possible input value along that dimension. However, the normalization for a linear dimension is necessarily different than for a circular dimension. Because the neural tuning functions were assumed to be sine waves, normalization requires that the sum of squared weights add up to a constant value. For a linear dimension, this sum of squares rule only applies to the subset of cells that are relevant to a particular value along the dimension whereas for a circular dimension, this sum of squares rule is over the entire set of cells that represent the dimension (i.e., weight normalization is easier to implement with circular dimensions). Although all dimensions were assumed to be circular for reasons of mathematical convenience and parsimony, circular dimensions may relate to the finding that human observers have difficultly re-orienting themselves in a room depending on the degree of rotational symmetry of the room (Kelly et al., 2008). In addition, this simplifying assumption allows the model to capture the finding that the population of grid cells lies on a torus (Gardner et al., 2022), although I note that the model was developed before this result was known.”

      (3) Why is it 3 components? I realise that the number doesn't matter too much, but I believe more is better, so is it just for simplicity? 

      In this revision, additional text has been added to explain this assumption: “To keep the model simple, the same number of cells was assumed for all dimensions and all dimensions were assumed to be circular (head direction is necessarily circular and because one dimension needed to be circular, all dimensions were assumed to be circular). Three cells per dimensions was chosen because this provides a sparse population code of each dimension, with few border cells responding between borders, with few border cells responding between borders, while allowing three separate phases of grid cells within a grid cell module (in the model, a grid cell module arises from combination of a third dimension, such as head direction, with the real-world X/Y dimensions defined by border cells).”

      As a reminder, the text explaining the sparse coding of border cells reads: “However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).”

      The model can work with just two opponent cells or with more than three cells per basis set. In different simulations, I have explored these possibilities. Three was chosen because it is a convenient way to highlight the face-centered cubic packing of memories that tends to occur (FCP produces 3 alternating layers of hexagonally arranged firing fields). Thus, each of the three head direction cells captures a different layer of the FCP arrangement. A more realistic simulation might combine 6 different head direction cells tiling the head direction dimension with opponent border cells (just 2 cells for each border dimensions). Such a combination would produce responses at borders, but no responses between borders and, at the same time, the head direction cells would still reveal the FCP arrangement. However, it is not easy to find the right parameters for such a mix-and-match simulation in which different dimensions have different numbers of tuning functions (e.g., some dimensions having 2 cells while others have 3 or 6 and some dimensions being linear while others are circular). When all of the dimensions are of the same type, the simple sum that arises from multiplying the input by the weight values gives rise to Euclidean distance (see Figure 3B). With a mix-and-match model of different dimension-types, it should be possible to adjust the sum to nevertheless produce a monotonic function with Euclidean distance although I leave this to future work. To keep things simple, I assumed that all dimensions are of the same type (circular, with 3 cells per dimension).  

      (4) Confusion due to the border cells/box was unclear to me. "If the period of the circular border cells was the same as the width of the box, then a memory pushed outside the box on one side would appear on the opposite side of the box, in which case the partial grid field on one side should match up with its remainder on the other side. This would entail complete confusion between opposite sides of the box, and the representation of the box would be a torus (donut-shaped) rather than a flat two-dimensional surface. To reduce confusion ..." Is this confusion of the model? Of the animal?  

      This would be confusion of the animal (e.g., a memory field overlapping with one border would also appear at the opposite border in the corresponding location). At one point in model development, I made the assumption that one side of the box wraps to the other side, and I asked Trygve Solstad to run some analyses of real data to see if cells actually wrap around in this manner. He did not find any evidence of this, and so I decided to include outsidethe-box representational area which, as it turned out, allowed the model to capture other behaviors as detailed in the paper.

      This section of the paper now reads:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      (5) Figure 3 - This result seems to be related to whether you use Euclidean or city-block distance. If you use Euclidean distances in two dimensions wouldn't this work out fine?  

      Euclidean distance was the metric used in the analysis of the two-dimensional simulation, but this did not work out. To make this clear, I have changed the label on the x-axes to read “Euclidean distance” for both the two- and three-dimensional simulations. The two-dimensional simulation produced city block behavior rather than Euclidean behavior because memory retrieval is the sum of the two dimensions, as is standard in neural networks, rather than the Euclidian distance formula, which would require that memory retrieval be the square root of the sum of squares of the two dimensions. One way to address this problem with the two-dimensional simulation would be to use a specific Euclidean-mimicking activation function rather than a simple sum of dimensions. The very first model I developed used such an activation function as applied to opponent border cells with just two dimensions (so 4 cells in total – left/right and top/down). This produced Euclidean behavior, but the activation function was implausible and did not generalize to simulations that also included head direction. In contrast, with three non-orthogonal dimensions, the simple sum of dimensions is approximately Euclidean.

      (6) Final sentence of the Discussion: "However, unlike the present model, these models still assume that entorhinal grid cells represent space rather than a non-spatial attribute." I am not sure if the authors of the cited papers will agree with this. They consider the spatial cases, but most argue they can treat non-spatial features as well. What the author might mean is that they assume non-spatial features are in some metric space that, in a way, is spatial. However, I am not sure if the author would argue that non-spatial features cannot be encoded metrically (e.g., Euclidean distance based on the similarity of odours). 

      In this section, when referring to “entorhinal grid cells” I was specifically referring to traditional grid cells in a rodent spatial navigation experiment. I did not mean to imply that these other theories cannot explain nonspatial grid fields, such as in the two-dimensional bird space grid cells found with humans. The way in which the proposed memory model and these other models differ is in terms of what they assume regarding the function of grid cells that exhibit spatial grid fields. In this revision, I have changed this text to read:

      “These models can capture some of the grid cell results presented in the current simulations, including extension to non-spatial grid-like responses (e.g., grid field that cover a two-dimensional neck/leg length bird space). Furthermore, these models may be able to explain memory phenomena similar to the model proposed in this study. However, unlike the proposed model, these models assume that the function of entorhinal grid cells that exhibit spatial X/Y grid fields during navigation is to represent space. In contrast, the memory model proposed in this study assume that the function of spatial X/Y grid cells is to represent a non-spatial attribute; the only reason they exhibit a spatial X/Y grid is because memories of that non-spatial attribute are arranged in a hexagonal grid owing to the uncluttered/unvarying nature of the enclosure. Thus, these model do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010b; Diehl et al., 2017; Grieves & Jeffery, 2017) whereas the proposed model can explain this situation as reflecting the miss-classification of grid cells with a spatial arrangement as providing spatial input to hippocampus.”

      (7) It would be interesting to see videos/gifs of the model learning, and an idea of how many steps of trials it takes (is it capturing real-time rodent cell firing whilst foraging, or is it more abstracted, taking more trials). 

      The short answer is “yes”, the model is capturing real-time rodent cell firing while foraging. This is particularly true when simulating place cell memories in the absence of head direction information, as was shown in a video provided in the initial submission in relation to Figure 4. In this revision, I have provided a second video of learning when simulating place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. This shows that even when learning a three-dimensional real-world space (X, Y, and head direction), the model rapidly produces an on-average hexagonal arrangement of place cells memories owing to the slight tendency of the place cell memories to linger in some locations as compared to others during consolidation. More specifically, they are more likely to linger in the locations that are the intersections of the peaks and/or troughs of the border cells and it is this tendency that supports the immediate appearance of grid cells. However, because the place cell memories are still shifting, head direction conjunctive grid cells are slower to emerge (the head direction conjunctive grid cells require stabilization of the place cells). The video then speeds up the learning process to so how place cells eventually stabilize after sufficient learning of the borders of the enclosure from different head/view directions.

      (8) One question is whether all the results have to be presented in the main text. It was difficult to see which key predictions fit the data and do so better than a spatial/navigation account. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      Reviewer #3 (Recommendations For The Authors):  

      The title could usefully be shortened to focus on the main argument that observed firing patterns could be consistent with mapping memories instead of space. It's a stretch to argue that memory is the primary role when no such data is presented (i.e., there is no comparison of competing models). 

      This is a good point (I do not present evidence that conclusively indicates the function of MTL). This original title was chosen to make clear how this account is a radical departure from other accounts of grid cells. The revised title highlights that: 1) a memory model can also explain rodent single cell recording data during navigation; and 2) grid cell may not be non-spatial. The revised title is: “A Memory Model of Rodent Spatial Navigation: Place Cells are Memories Arranged in a Grid and Grid Cells are Non-spatial”

      When arguing that the main role of the hippocampus is memory, I strongly suggest engaging with the work of people like Howard Eichenbaum who spent the better part of their career arguing the same (e.g. DOI:10.1152/jn.00005.2017.)  

      Thank you for pointing out this important oversight. Early in introduction, I now write: “The proposal that hippocampus represents the multimodal conjunctions that define an episode is not new (Marr et al., 1991; Sutherland & Rudy, 1989) and neither is the proposal that hippocampal memory supports spatial/navigation ability (Eichenbaum, 2017). This view of the hippocampus is consistent with “feature in place” results (O’Keefe & Krupic, 2021) in which hippocampal cells respond to the conjunction of a non-spatial attribute affixed to a specific location, rather than responding more generically to any instance of a non-spatial attribute. In other words, the what/where conjunction is unique. Furthermore, the uniqueness of the what/where conjunction may be the fundamental building block of spatial memory and navigation. In reviewing the hippocampal literature, Howard Eichenbaum (2017) concludes that ‘the hippocampal system is not dedicated to spatial cognition and navigation, but organizes experiences in memory, for which spatial mapping and navigation are both a metaphor for and a prominent application of relational memory organization.’”

      With a focus on episodic memory, there should be a mention of the temporal component of memory. While it may rightfully be beyond the scope of this model, it's confusing to omit time completely from the discussion. 

      This issue and several others are now addressed in a new section in the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      I recommend explaining the motivation of the theory in more detail in the introduction. It reads as "what if it's like this?" It would be helpful to instead highlight the limitations of current theories and argue why this theory is either a better fit for the data or is logically simpler. 

      This issue and several others are now addressed in the new section in the introduction titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’, which I quoted above in response to the public reviews.

      It's worth considering shortening the results section to include only those that most convincingly support the main claim. The manuscript is quite long and appears to lack focus at times. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      The discussion of path dependence on the formation of the grid pattern is important but only briefly discussed. It may be useful to add simulations testing whether different paths (not random walks) produce distorted grid patterns. 

      The short answer is that the path doesn’t affect things in general. The consolidation rule ensures equally spaced memories even if, for instance, one side of the enclosure is explored much more than the other side. As just one example, I have run simulations with a radial arm maze and even though the animal is constrained to only run on the maze arms. The memories still arrange hexagonally as memories become pushed outside the arms. Rather than adding additional simulations to study, I now briefly describe this in the model methods:

      “Of note, the ability of the model to produce grid cell responses does not depend on this decision to simulate an animal taking a random walk – the same results emerge if the animal is more systematic in its path. All that matters for producing grid cell responses is that the animal visits all locations and that the animal takes on different head directions for the same location in the case of simulations that also include head direction as an input to hippocampal place cells.”

      I struggle to understand in Figure 3 why retrieval strength ought to scale monotonically with Euclidean distance, and why that justifies a more complex model (three non-orthogonal dimensions). 

      The introduction to this section now reads: “Animals can plan novel straight line paths to reach a known position and evidence suggests they do so by learning Euclidean representations of space (Cheng & Gallistel, 2014; Normand & Boesch, 2009; Wilkie, 1989). Thus, it was assumed that hippocampal place cells represent positions in Euclidean space (as opposed to non-Euclidean space, such a occurs with a city-block metric).”

      p.17 "although the representational space is a torus (or more specifically a three-torus), it is assumed that the real-world two-dimensional surface is only a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut)." I fail to understand how the realworld surface is only a part of the torus. In the existing theoretical and experimental work on toroidal topology of grid cell activity, the torus represents a very small fraction of the real world, and repeating activity on the toroidal manifold is a crucial feature of how it maps 2D space in a regular manner. Why then here do you want the torus to be larger than the realworld? 

      This section has been rewritten to better explain these assumptions. The relevant paragraphs now read:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      p.28 "More specifically, egocentric grid cells (e.g., head direction conjunctive grid cells) require stabilization of the place cell memories in the face of ongoing consolidation whereas allocentric grid cells reflect on-average place field positions." and p.32 "if place cells represent episodic memories, it seems natural that they should include head direction (an egocentric viewpoint)." But the head direction signal is not egocentric, it is allocentric. I'm unsure whether this is a typo or a potentially more serious conceptual misunderstanding. 

      Any reference to egocentric has been removed in this revision. In the initial submission, when I used egocentric, I was referring to memories that depended on the head direction of the animal at the time of memory formation. I was using “egocentric” in relation to whether the memory was related to the animal’s personal bodily experience at the time of memory formation. But I concede that this is confusing since the ego/allo distinction is typically used to differentiate angular directions that are relative to the person (left/right) versus earth (East/West). Instead, throughout the manuscript I now refer to these as view-dependent memories since head direction would entail having a different view of the environment at the time of memory formation. I still refer to the stacking of multiple view-dependent memories on the same X/Y location as being the development of an allocentric representation however, since this can be thought of as one way to learn a cognitive map of the enclosure that is view independent.

      p.37 "But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would have appeared to undergo global remapping as their positions rotated by 90 degrees and the grid pattern would have also rotated." But this would not be interpreted as global remapping by standard analyses of place and grid cell responses. A coherent rotation of firing patterns is not interpreted as remapping. 

      This sentence now reads: “But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would remain in their same positions relative to the now-rotated borders (i.e., no remapping relative to the enclosure) and the corresponding grid cells would also retain their same alignment relative to the enclosure.”

      p.37 "this is more accurately described as partial remapping (nearly all place fields were unaffected)." If nearly all place fields were unaffected, this should be interpreted as a stable map. Partial remapping is a mix of stability, rate remapping, and global remapping within a population of place cells. 

      This sentence has been removed.

      p.40 "The dependence of grid cell responses on memory may help explain why grid cells have been found for bats crawling on a two-dimensional surface (Yartsev et al., 2011), but three-dimensional grid cells have never been observed for flying bats." This is not true. Ginosar et al. (2021) observed 3D grid cells in flying bats.  

      Thank you for highlighting this issue. In the initial submission I was using “grid cell” to mean a cell that produced a precise hexagonal grid, which is not the case for the 3D grid cells in bats. In this revision, I now discuss grid cell that produce irregular grid fields, writing:

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multi-location grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is realworld variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three realworld dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      Multiple typos are found on page 25, end of paragraph 3: "More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells."

      As detailed above in the response the public reviews, this paragraph has been rewritten.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology.

      Strengths:

      (1) The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology.

      (2) It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB.

      (3) The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation.

      Weaknesses:

      (1) The evidence is inconsistent across models, leading to divergent conclusions that weaken the overall impact of the study.

      The strength of the study is the use of multiple models including mouse, non-human primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Key claims, such as MC-mediated cytokine responses and conversion of MC subtypes in granulomas, are not well-supported by the data presented.

      To address the reviewer’s comments, we will carry out further experimentation to strengthen the link between MC subtypes and cytokine responses.

      (3) Several figures are either contradictory or lack clarity, and important discrepancies, such as the differences between mouse and human data, are not adequately discussed.

      We will further clarify the figures and streamline the discussions between the different models used in the study.

      (4) Certain data and conclusions require further clarification or supporting evidence to be fully convincing.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

      Reviewer #2 (Public review):

      Summary:

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations.

      Strengths:

      (1) The authors have carried out a sufficient literature review to establish the background and significance of their study.

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis.

      Weaknesses:

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers.

      The strength of the study is the use of multiple models including mouse, non-human primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Throughout the manuscript, the authors have mislabelled the legends for WT B6 mice and mast cell-deficient mice. As a result, the discussion and claims made in relation to the data do not align with the corresponding graphs (Figure 1B, 3, 4, and S2). This discrepancy undermines the accuracy of the conclusions drawn from the results.

      We apologize for the discrepancy which will be corrected in the revised manuscript

      (3) The results discussed in the paper do not add a significant novel aspect to the field of tuberculosis, as the majority of the results discussed in Figure 1-2 are already known and are a re-validation of previous literature.

      This is the first study which has used mouse, NHP and human TB samples from Mtb infection to characterize and validate the role of MC in TB. We believe the current study provides significant novel insights into the role of MC in TB.

      (4) The claims made in the manuscript are only partially supported by the presented data. Additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, BOUTRY et al examined a cnidarian Hydra model system where spontaneous tumors manifest in laboratory settings, and lineages featuring vertically transmitted neoplastic cells (via host budding) have been sustained for over 15 years. They observed that hydras harboring long-term transmissible tumors exhibit an unexpected augmentation in tentacle count. In addition, the presence of extra tentacles, enhancing the host's foraging efficiency, correlated with an elevated budding rate, thereby promoting tumor transmission vertically. This study provided evidence that tumors, akin to parasitic entities, can also exert control over their hosts.<br /> Strengths:

      The manuscript is well-written, and the phenotype is intriguing.

      Weaknesses:

      The quality of this manuscript could be improved if more evidence were to be provided regarding the beneficial versus detrimental effects of the tumors.

      We thank the reviewer for taking the time to examine our work carefully and for their highly relevant comments and precise suggestions. We have incorporated these suggestions, which greatly improved the clarity of our manuscript concerning the beneficial and detrimental effects of tumors. Specifically, we have added a new analysis and rephrased the results section, as well as the corresponding sentences in the discussion, to enhance clarity.

      Additionally, regarding the impact of tumor size on the development of supernumerary tentacles, we have included as suggested a new analysis that was previously only available in the supplementary materials of the earlier version. This addresses the reviewer's question and significantly enhances the quality of our paper.

      We have thanked the two referees in the Acknowledgements section of our article.

      Reviewer #2 (Public Review):

      Background and Summary:

      This study addresses the intriguing question of whether and how tumors can develop in the freshwater polyp hydra and how they influence the fitness of the animals. Hydra is notable for its significant morphogenetic plasticity and nearly unlimited capacity for regeneration. While its growth through asexual reproduction (budding) and the associated processes of pattern formation have been extensively studied at the cellular level, the occurrence of tumors was only recently described in two strains of Hydra oligactis (Domazet-Lošo et al, 2014). In that research, an arrest in the differentiation of female germ cells led to an accumulation of germline cells that failed to develop into eggs. In hydra, fertile egg cells typically incorporate nurse cells, which originate from large interstitial stem cells (ISCs) restricted to the germline, through apoptosis. However, this increase in apoptosis activity is absent in "germline tumors," and germline ISCs instead form slowly growing patches that do not compromise tissue integrity. Despite the upregulation of certain genes associated with mammalian neoplasms (such as tpt1 and p23) in this tissue, determining whether this differentiation arrest and the resulting egg patches truly constitute neoplasms remains a challenge.

      The authors have recently published two papers on the ecological and evolutionary aspects of hydra tumor formation (Boutry et al 2022, 2023), which is also the focus of this manuscript. They transplanted tissues derived from animals with germline tumors to wildtype animals and analyzed their growth patterns, specifically the number of tentacles in the host tissue. They observed that such tissues induced the growth of additional tentacles compared to tissues without germline tumors. The authors conclude that this growth pattern (increased number of tentacles) is correlated with "reducing the burden on the host by (over-)compensating for the reproductive costs of tumors" and claim that "transmissible tumors in hydra have evolved strategies to manipulate the phenotype of their host". While it might be stimulating to add a fresh view from other disciplines (here, ecological and evolutionary aspects), the authors completely ignore the current knowledge of the underlying cell biology of the processes they analyze.

      Strengths:

      The study focuses on intriguing questions. Whether and how tumors can develop in the freshwater polyp hydra, and how they influence the fitness of the animals?

      Weaknesses:

      Concept of germline tumors.

      The conceptual foundation of their experiments on germline tumors was the study of Domazet-Lošo et al (2014) introducing the concept of germline tumors in hydra (see above). While this is an intriguing hypothesis, there has been little advancement in comprehending the molecular mechanisms underlying tumor formation in hydra beyond this initial investigation. Germline tumors in hydra do not fully meet the typical criteria for neoplasms observed in mammalian tissues. More importantly, a similar phenotype was already reported by the work of Paul Brien and described as "crise gametique" (Brien, 1966, Biologie de la reproduction animale - Blastogenèse, Gamétogenèse, Sexualisation, ed. Masson & Cie, Paris). This phenomenon of gametic crisis is unique to Hydra oligactis, a stenotherm, cold-adapted cosmopolitan species. In this species, gametogenesis severely impacts the vitality of the polyps, often leading to complete exhaustion and death (Tardent, 1974). Animals can only be rescued during the initial phase of the cold-induced sexual period (see also the research of Littlefield (1984, 1985, 1986, 1991). The observed arrest in differentiation arrest in germline tumors might represent an epigenetically established consequence of surviving gametogenesis. Regrettably, this important work was not mentioned by the authors or by Domazet-Lošo et al. (2014), highlighting a notable gap in the recognition of basic research in this area that might challenge the hydra tumor hypothesis.

      "Super-nummary" tentacles in graft experiments.

      The authors describe that after grafting tissue from animals with germline tumors to wild-type animals, the number of tentacles in the host tissue increased when the donor tissue had germline tumors. A maximum effect of four additional tentacles was found with donor strain H. oligactis robusta and three additional tentacles with donor strain H.oligactis St Petersburg. In general, H.oligactis wild-type host strains had fewer tentacles than H.oligactis St Petersburg strains. This is consistent with the results of Domazet-Lošo et al (2014) who showed that the number of tentacles increased in the strains with germline tumors. What conclusions can be drawn from these experiments? 

      The authors might want to conclude that transmissible tumors in Hydra have developed strategies to manipulate the phenotype of their host. But there is no evidence for this, as essential controls are missing. It is known that the size of hydra polyps is proportion-regulated, i.e. the number of tentacles varies with the size and number of (epithelial) cells. Such controls are missing in the experiments. There is also a lack of controls from wild-type animals in gametogenesis: it is very likely that grafts with wild-type animals with egg spots of comparable size as the germline tumors (see above) will result in similar numbers of tentacles in host tissue.

      We thank the reviewer for their thoughtful comments. While we appreciate the concerns raised, we maintain that the evidence provided by Domazet-Lošo et al. (2014, Nature Communications) supports the relevance of this model, including the suggested comparisons with the expression profiles of individuals undergoing induced sexual reproduction. Our study focuses primarily on the impact of these tumors on the host phenotype rather than their origin. Tumors are defined as accumulations of abnormally proliferating cells. This includes the definition provided by the referee, which describes “apoptosis activity as absent in 'germline tumors,' with germline ISCs forming slowly growing patches.” Compromise of tissue integrity is not a criterion for defining neoplasms, and many benign neoplasms do not meet this criterion. We are interested in continuing this discussion with the referee to better understand the expected evidence and agree that histological nomenclature could be improved. While further investigation into the cell biology of these tumors would be valuable, this is currently beyond the scope of our article but is being pursued in separate research.

      We also appreciate the points raised regarding the definition of germline tumors and the reference to the pioneering work of Paul Brien. However, in that publication, the concept of gametic crisis in H. oligactis describes reproductive exhaustion leading to death, rather than abnormal cell proliferation indicative of a tumor-like phenotype. This distinction likely explains why this specific paper was not cited previously.

      Our study builds on prior research using the same model (e.g., Domazet-Lošo et al. 2014; Boutry et al. 2023) and describes observations across different hydra strains from various locations worldwide (not just two), all conducted under stable warm temperatures that are not conducive to sexual development. These investigations reveal a phenomenon distinct from the senescence observed post-reproduction in H. oligactis. The phenotype we describe, characterized by an accumulation of cells in the ectoderm, aligns with studies referenced by the reviewer from leading groups in hydra research, known for their expertise in hydra cellular biology. We have relied on these studies after carefully reviewing their results and receiving training from these experts. Furthermore, our team is focused on eco-evolutionary topics and does not aim to specialize in cellular biology, as other teams are already dedicated to that field.

      We also thank the reviewer for their comments on the relevance of our findings and the missing controls. However, we have noted that the reviewer may have misunderstood our experimental design and results.

      Firstly, it appears that the reviewer based their critique mainly on the initial sentences of our Results section (illustrated in Figure 2), which outline the donor groups used in our study rather than presenting the results of the grafting experiments. This description alone is insufficient for drawing conclusions, which is why we conducted further analyses using these donor groups grafted onto different recipients. The maximum effects mentioned by the reviewer (+10 tentacles with St. Petersburg tumoral tissue and +8 tentacles with Robusta tumoral tissue, Results Section 2) represent only a part of our study. We encourage the reviewer to focus on the model analyses presented in Results Section 2, which directly relate to the grafting experiments and provide a more comprehensive evaluation of our results and conclusions. These analyses include comparisons between transmissible tumors and spontaneous tumors, offering deeper insights into their effects on tentacle development.

      In our methods (as depicted in Figure 3), we explicitly compared different types of tumorous tissue from various donors, distinguishing between spontaneous and transmissible tumors. Although we avoid labeling spontaneous tumors as "controls" to prevent confusion with healthy tissue controls, they serve as controls to the “treatment” that involves transmissible tumors, and thus are appropriate comparisons for assessing the size effect suggested by the reviewer. Spontaneous and transmissible tumors share similar size and cellular characteristics but differ significantly in the number of tentacles their hosts possess. Furthermore, we refer the reviewer to a relevant study (Ngo et al. 2021) that found no increase in tentacle numbers with larger polyps of healthy tissue. This reference has been included in the revised discussion (line 309 to 312), which now also addresses the potential effect of body size with additional explanations.

      Regarding the suggestion to include controls from animals undergoing gametogenesis, we did not find evidence in the literature indicating an increase in tentacle numbers during this process in hydra. If such studies exist, we kindly request the complete references so we can include them in our discussion. Additionally, as noted in Brien's work, Hydra oligactis undergoing gametogenesis are known to either die or experience significant degeneration afterward. Transplanting tissue from dead or dying (and reproducing) hydras poses technical challenges and raises questions about whether any observed effects result from incomplete gametogenesis, the onset of senescence, or both. While these questions are intriguing, they fall outside the scope of our article.

      In conclusion, we appreciate the opportunity to address these points and reaffirm that our study offers valuable insights into the evolutionary dynamics of interactions between transmissible tumor tissues and host phenotypes in hydra. We remain open to further discussion and welcome any additional feedback to enhance the clarity and robustness of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) If the fitness of hydra is altered in those with spontaneous tumors is the increased number of tentacles associated with those with transmitted tumors able to rescue this phenotype?

      We thank the reviewer for reformulating our results. Indeed, fitness can be restored and even improved in tumorous polyps harboring supernumerary tentacles. This phenomenon, which we referred to as compensation and over-compensation in Section 3 and Figure 4, was initially discussed only in the discussion section. To improve the clarity of our manuscript, we have now specified this in the Conclusion (lines 345 to 347 and some minor rewording in the same paragraph) in the Results section (lines 284 to 286).

      (2) Does the size of the tumor predict the number of tentacles formed?

      We agree that this would be a valuable complementary analysis. We have conducted an analysis considering the qualitative size of the tumors (based on visual categories) and the number of tentacles, which is now included in our paper (lines 160-161; lines 193 to 198; lines 253 to 259; lines 314 - 322).

      (3) Considering the mentioned association of body size with tentacle numbers for hydra, is a change in size a phenotype associated with transmitted tumors, and is such a phenotype transmittable. 

      All tumorous individuals, regardless of their tumor type, exhibit a swollen body. We have added a sentence in the introduction to clarify this point (line 62).

      (4) Is there anything unique about the Rob population that would explain their mass mortality following transplantation? For instance, their resistance to spontaneous tumor formation? Similarly, is there a difference in transplantation success based on the type of tissue transplanted? The authors could address this point in the discussion.

      It is a very old lineage described nearly 80 years ago. It is unknown whether natural populations of Robusta exist, and no reports of any male individuals have been documented. We have added a sentence in the Materials and Methods section to clarify this information (lines 98 to 102).

      (5) What downsides are known about the transmittable tumors in hydra and how present are they in the grafted individuals? Are other physiological aspects such as mobility, regeneration, or sexual reproduction hindered?

      Transmissible tumors have been associated with increased vulnerability to predation and alterations in life history traits, including a higher budding rate and decreased sexual reproduction. While we were unable to measure behavioral traits in this study of our grafted individuals, this is an intriguing avenue for further research. We have included this perspective in the discussion section as a concluding remark (lines 375 to 382). Thanks a lot for the suggestion of this conclusion.

      (6) It is important to explore the mechanisms behind the phenotypic variation conferred by the types of tumors, whether of different lineage or transmissibility. For this purpose, RNA-Seq on the recipients seems like a good starting point.

      Thanks for this suggestion, we've reworded the sentence about this perspective in our discussion to be more precise (line 320).

      Boutry, Justine, Marie Buysse, Sophie Tissot, Chantal Cazevielle, Rodrigo Hamede, Antoine M. Dujon, Beata Ujvari, et al. 2023. « Spontaneously Occurring Tumors in Different Wild-Derived Strains of Hydra ». Scientific Reports 13 (1): 7449. https://doi.org/10.1038/s41598-023-34656-0.

      Domazet-Lošo, Tomislav, Alexander Klimovich, Boris Anokhin, Friederike Anton-Erxleben, Mailin J. Hamm, Christina Lange, et Thomas C. G. Bosch. 2014. « Naturally occurring tumours in the basal metazoan {Hydra} ». Nat Commun 5 (1): 4222. https://doi.org/10.1038/ncomms5222.

      Ngo, Kha Sach, Berta R-Almási, Zoltán Barta, et Jácint Tökölyi. 2021. « Experimental Manipulation of Body Size Alters Life History in Hydra ». Ecology Letters 24 (4): 728‑38. https://doi.org/10.1111/ele.13698.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      This important study provides proof of principle that C. elegans models can be used to accelerate the discovery of candidate treatments for human Mendelian diseases by detailed high-throughput phenotyping of strains harboring mutations in orthologs of human disease genes. The data are compelling and support an approach that enables the potential rapid repurposing of FDA-approved drugs to treat rare diseases for which there are currently no effective treatments. The authors should provide a clearer explanation of how the statistical analyses were performed, as well as a link to a GitHub repository to clarify how figures and tables in the manuscript were generated from the phenotypic data.

      We have amended our description of the statistical analysis in the materials and methods section of the manuscript. We have also updated the GitHub repository link to a dedicated repository for this study, this contains all of the code needed to generated all the figures made from the phenotypic data provided. Additionally, we have updated the Zenodo repository to contain both the code and datasets within the same file.

      We have also updated the GitHub repository link to a dedicated repository for this manuscript, that contains all of the code needed to generate all figures from the phenotypic data provided. Additionally, we have updated the Zenodo repository link to contain both the code and datasets within the same folder structure. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have responded to previous review to improve the presentation of the work. The paper more than meets publication standards.

      No response required.

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed all of my questions and concerns. I'm happy to see this updated paper of record.

      No response required.

      Reviewer #3 (Recommendations for the authors):

      Regarding the interactive heatmap

      The html version and the panel in Figure 2C appear not to coincide visually. Maybe the features are ordered in a different way?

      The html version of Figure 2C is for the entire feature set extract per strain and not the condensed Tierpsy256 set shown in the panel figure. We have now remade this figure to show this reduced feature set (aligning with what is shown in Figure 2C) and included both versions of the interactive heatmaps as static html files within the same repository.

      Regarding data accessibility overall

      More generally, the html file does not address my initial concern about the accessibility of the data to non-experts. Making the full dataset available was a necessary first step, but the hermetic nature of its format and the lack of a simple way to query the data remains an issue for me that limits the usefulness of this data to the broadest audience.

      We agree, but unfortunately do not currently have the resources to build a public-facing database to facilitate this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper examines the role of MLCK (myosin light chain kinase) and MLCP (myosin light chain phosphatase) in axon regeneration. Using loss-of-function approaches based on small molecule inhibitors and siRNA knockdown, the authors explore axon regeneration in cell culture and in animal models. Their evidence shows that MLCK activity facilitates axon extension/regeneration, while MLCP prevents it.

      Major concern:

      A global inconsistency in the conclusions of the authors is evident when trying to understand the role of NMII in axon growth and to understand the present results in light of previous reports by the authors and many others on the role of NMII in axon extension. The discussion of the matter fails to acknowledge a vast literature on how NMII activity is regulated. The authors study enzymes responsible for the phosphorylation and dephosphorylation of NMII, referring to something that is strongly proven elsewhere, that phosphorylation activates NMII and dephosphorylation deactivates it. The authors mention their own previous evidence using inhibitors of NMII ATPase activity (blebbistatin, Bleb for short) and inhibitors of a kinase that phosphorylates NMII (ROCK), highlighting that Bleb increases axon growth. Since Bleb inhibits the ATPase activity of NMII, it follows that NMII is in itself an inhibitor of axon growth, and hence when NMII is inhibited, the inhibition on axon growth is relieved, and axonal growth takes place (REF1). It is known that NMII exists in an inactive folded state, and ser19 phosphorylation (by MLCK or ROCK) extends the protein, allowing NMII filament formation, ATPase activity, and force generation on actin filaments (REF2). From this, it is derived that if MLCK is inhibited, then there is no NMII phosphorylation, and hence no NMII activity, and, according to their previous work, this should promote axon growth. On the contrary, the authors show the opposite effect: in the lack of phospho-MLC, authors show axon growth inhibition.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the comments from the reviewer. We have tried our best to revise the manuscript to address all the comments raised by the Reviewer.

      Reporting evidence challenging previous conclusions is common business in scientific endeavors, but the problem with the current manuscript is that it fails to point to and appropriately discuss this contradiction. Instead, the authors refer to the fact that MLCK and Bleb inhibit NMII in different steps of the activation process. While this is true, this explanation does not solve the contradiction. There are many options to accommodate the information, but it is not the purpose of this revision to provide them. Since the manuscript is focused solely on phosphorylation states of MLC and axon extension, the claims are simply at odds with the current literature, and this important finding, if true, is not properly discussed.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line 373-374).

      What follows is a discussion of the merits and limitations of different claims of the manuscript in light of the evidence presented.

      (1) Using western blot and immunohistochemical analyses, authors first show that MLCK expression is increased in DRG sensory neurons following peripheral axotomy, concomitant to an increase in MLC phosphorylation, suggesting a causal effect (Figure 1). The authors claim that it is common that axon growth-promoting genes are upregulated. It would have been interesting at this point to study in this scenario the regulation of MLCP, which is a main subject in this work, and expect its downregulation.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      (2) Using DRG cultures and sciatic nerve crush in the context of MLCK inhibition and down-regulation, authors conclude that MLCK activity is required for mammalian peripheral axon regeneration both in vitro and in vivo (Figure 2).

      The in vitro evidence is of standard methods and convincing. However, here, as well as in all other experiments using siRNAs, it is not clear what the control is about (the identity of the plasmids and sequences, if any).

      We used the pCMV–EGFP–N3 as control, and the pCMV–EGFP–N3 plasmid was from Clontech, Inc. (line 114-115).   

      Related to this, it is not helpful to show the same exact picture as a control example in Figures 2 and 3 (panels J and E, respectively). Either because they should not have received the same control treatment, or simply because it raises concern that there are no other control examples worth showing. In these images, it is not also clear where and how the crush site is determined in the GFP channel. This is of major importance since the axonal length is measured from the presumed crush site. Apart from providing further details in the text, the authors should include convincing images.

      Thank you so much for your comments. We changed the control example in Figure 3J. For sciatic nerve regeneration experiments, the sciatic nerve was exposed at the sciatic notch by a small incision 2 days after the in vivo electroporation. The nerve was then crushed, and the crush site was marked with a 11-0 nylon epineural suture. After surgeries, the wound was closed, and the mice were allowed to recover. Three days after the sciatic nerve crush, the whole sciatic nerves from the perfused animals were dissected out and postfixed overnight in 4% PFA at 4°C. Before whole-mount flattening, it was confirmed that the place of epineural suture matched the injury site, and experiments were included in the analysis only when the crush site was clearly identifiable. Using whole mounted tissue, all identifiable EGFP-labeled axons in the sciatic nerve were manually traced from the crush site to the distal growth cone to measure the length of axon regeneration. (line 159-164).

      (3) The authors then examined the role of the phosphatase MLCP in axon growth during regeneration. The authors first use a known MLCP blocker, phorbol 12,13-dibutyrate (PDBu), to show that is able to increase the levels of p-MLC, with a concomitant increase in the extent of axon regrowth of DRG neurons, both in permissive as well as non-permissive. The authors repeat the experiments using the knockdown of MYPT1, a key component of the MLC-phosphatase, and again can observe a growth-promoting effect (Figure 3).

      The authors further show evidence for the growth-enhancing effect in vivo, in nerve crush experiments. The evidence in vivo deserves more evidence and experimental details (see comment 2). Some key weaknesses of the data were mentioned previously (unclear RNAi controls and duplication of shown images), but in this case, it is also not clear if there is a change only in the extent of growth, or also in the number of axons that are able to regenerate.

      Thank you so much for your comments. We used same control as in vitro experiments (the pCMV– EGFP–N3 plasmid was from Clontech, Inc), and we also changed the control image in Figure 3J. For in vivo axon regeneration experiments, we measured the lengths of all identifiable EGFP-labelled axons in the sciatic nerve from the crush site to the distal axonal ends. The number of EGFP labeled regenerating axons were actually determined by the electroporation rate of EGFP, which is similar, but not identical, in different mice. Thus, our data only can show the differences in axon lengths among different experimental conditions. Such approach has been used in many of our previously published papers (e.g. Saijilafu et al. Nature Communications, 2011, Saijilafu et al. Nature Communications, 2013). (line 152-153).

      (4) In the next set of experiments (presented in Figure 4) authors extend the previous observations in primary cultures from the CNS. For that, they use cortical and hippocampal cultures, and pharmacological and genetic loss-of-function using the above-mentioned strategies. The expected results were obtained in both CNS neurons: inhibition or knockdown of the kinase decreases axon growth, whereas inhibition or knockdown of the phosphatase increases growth. A main weakness in this set is that it is not indicated when (at what day in vitro, DIV) the treatments are performed. This is important to correctly interpret the results, since in the first days in vitro these neurons follow well-characterized stages of development, with characteristic cellular events with relevance to what is being evaluated. Importantly, this would be of value to understand whether the treatments affect axonal specification and/or axonal extension. Although these events are correlated, they imply a different set of molecular events.

      The treatments were started from the initial of cell culture period, and this procedure may affect axon specification as the Reviewer point out. However, we mainly focused on axon length in our experiments, thus, for quantification of axon length, neurons with processes longer than twice the diameter of cell bodies were photographed, and the longest axon of each neuron was measured. We revised the manuscript as suggested by the reviewer (line 143-145).

      The title of this section is misleading: line 241 "MLCK/MLCP activity regulated axon growth in the embryonic CNS"... the title (and the conclusion) implies that the experiments were performed in situ, looking at axons in the developing brain. The most accurate title and conclusion should mention that the evidence was collected in CNS primary cultures derived from embryos.

      We have revised the manuscript as suggested by the reviewer (line 251).

      (5) Performing nerve crush injury in CNS nerves (optic nerve and spinal cord), and the local application of PBDu, the author shows contrasting results (Figure 5). In the ON nerve, they can see axons extending beyond the lesion site due to PBDu. On the contrary, the authors fail to observe so in the corticospinal tract present in the spinal cord. The authors fail to discuss this matter in detail. Also, they accommodate the interpretation of the evidence in light of a process known as axon retraction, and its prevention by MLCP inhibition. Since the whole paper is on axon extension, and it is known that mechanistically axon retraction is not merely the opposite of axon extension, the claim needs far more evidence.

      Thank you so much for your comments. Compared to optic nerve axons, corticospinal tract axons exhibit a reduced intrinsic axon growth capability. Consequently, we observed that PBDu stimulates optic nerve axon regeneration. However, unfortunately, we did not detect any enhancement in corticospinal tract axons beyond the injury site in SCI following the inhibition of myosin light chain phosphatase (MLCP) with PBDu.

      In panel 5F and the supplementary data, the authors mention the occurrence of retraction bulbs, but the images are too small to support the claim, and it is not clear how these numbers were normalized to the number of axons labeled in each condition.

      Thank you so much for your comments. In this study, we used a similar method from Ertürk et al. (2007) to quantify the retraction bulb. Both maximum width of the enlarged distal tip of the axon and the width of its immediately adjacent axon shaft was measured. Then, the ratio of these two widths was then calculated. An axonal tip was considered as a retraction bulb if its tip/shaft ratio exceeded 4. Averages number of retraction bulb were calculated from 3 sections in every mice for each group (n=5). (line 187-191).

      [Ref] Ertürk A, Hellal F, Enes J, and Bradke F (2007). Disorganized microtubules underlie the formation of retraction bulbs and the failure of axonal regeneration. J. Neurosci 27, 9169–9180. [PubMed:17715353].

      (6) The author combines MLCK and MLCP inhibitors with Bleb, trying to verify if both pairs of inhibitors act on the same target/pathway (Figure 6). The rationale is wrong for at least two reasons.<br /> a- Because both lines of evidence point to contrasting actions of NMII on axon growth, one approach could never "rescue" the other.

      If MLCK regulates axon growth through the activation of Myosin, the inhibitory effect of ML-7 (an MLCK inhibitor) on axon growth might be influenced by Bleb, a NMII inhibitor. However, our findings reveal that the combination of Bleb and ML-7 does not alter the rate of axon outgrowth compared to ML-7 alone. This suggests that the roles of ML-7 and Bleb in axon growth are independent. It means MLCK may regulates axon growth independent of NMII activity.

      b. Because the approaches target different steps on NMII activation, one could never "prevent" or rescue the other. For example, for Bleb to provide a phenotype, it should find any p-MLC, because it is only that form of MLC that is capable of inhibiting its ATPase site. In light of this, it is not surprising that Bleb is unable to exert any action in a situation where there is no p-MLC (ML-7, which by inhibiting the kinase drives the levels of p-MLC to zero, Figure 4A). Hence, the results are not possible to validate in the current general interpretation of the authors. (See 'major concern').

      The reported mechanism of blebbistatin is not through competition with the ATP binding site of myosin. Instead, it selectively binds to the ATPase intermediate state associated with ADP and inorganic phosphate, which decelerates the phosphate release. Importantly, blebbistatin does not impede myosin's interaction with actin or the ATP-triggered disassociation of actomyosin. It rather inhibits the myosin head when it forms a product complex with a reduced affinity for actin. This indicates that blebbistatin functions by stabilizing a particular myosin intermediate state that is independent of the phosphorylation status of myosin light chain (MLC).

      [Ref] Kovács M, Tóth J et al. Mechanism of blebbistatin inhibition of myosin II. J Biol Chem. 2004 Aug 20;279(34):35557-63. doi: 10.1074/jbc.M405319200.

      (7) In Figure 7, the authors argue that the scheme of replating and using ML7 before or after replating is evidence for a local cytoskeletal action of the drug. However, an alternative simpler explanation is that the drug acts acutely on its target, and that, as such, does not "survive" the replating procedure. Hence, the conclusion raised by the evidence shown is not supported.

      In our study, we meticulously assessed the neuronal survival rates across various experimental groups. The findings indicate no significant variation in survival rates among the groups. This suggests that the drug treatment exerts no discernible influence on cell viability but primarily modulates axonal elongation."

      Author response image 1.

      (8) In Figure 8, the authors show that the inhibitory treatments on MLCK and MLCP (ML7 and PRBu) alter the morphology of growth cones. However, it is not clear how this is correlated with axon growth. The authors also mention in various parts of the text that a local change in the growth cone is evidence for a local action/activity of the drug or enzyme. However, the local change<->local action is not a logical truth. It can well be that MLCK and MLCP activity trigger molecular events that ultimately have an effect elsewhere, and by looking at "elsewhere" one observes of course a local effect but is not because the direct action of MLCK or MLCP are localized. To prove true localized effects there are numerous efforts that can be made, starting from live imaging, fluorescent sensors, and compartmentalized cultures, just to mention a few.

      About the relationship between growth cone size and its growth rate, the previous published literatures found that a fast-growing axon tended to have small growth cones (Mason C. et al. 1997). A recent study on Aplysia further supports this by noting that growth cones enlarge significantly when axonal elongation halts (Miller and Suter, 2018). Consistent with these findings, our data indicate that inhibiting MLCP with PDBu treatment leads to a reduction in growth cone size, which in turn promotes axon regeneration.

      [Ref] Mason CA, Wang LC. Growth cone form is behavior-specific and, consequently, position-specific along the retinal axon pathway. J Neurosci. 1997; 13:1086–1100. [PubMed: 8994063]

      [Ref] Miller KE, Suter DM. An Integrated Cytoskeletal Model of Neurite Outgrowth. Front Cell Neurosci. 2018 Nov 26;12:447. doi: 10.3389/fncel.2018.00447. eCollection 2018.

      References:

      (1) Eun-Mi Hur 1, In Hong Yang, Deok-Ho Kim, Justin Byun, Saijilafu, Wen-Lin Xu, Philip R Nicovich, Raymond Cheong, Andre Levchenko, Nitish Thakor, Feng-Quan Zhou. 2011. Engineering neuronal growth cones to promote axon regeneration over inhibitory molecules. Proc Natl Acad Sci U S A. 2011 Mar 22;108(12):5057-62. doi: 10.1073/pnas.1011258108.

      (2) Garrido-Casado M, Asensio-Juárez G, Talayero VC, Vicente-Manzanares M. 2024. Engines of change: Nonmuscle myosin II in mechanobiology. Curr Opin Cell Biol. 2024 Apr;87:102344. doi: 10.1016/j.ceb.2024.102344.

      (3) Karen A Newell-Litwa 1, Rick Horwitz 2, Marcelo L Lamers. 2015. Non-muscle myosin II in disease: mechanisms and therapeutic opportunities. Dis Model Mech. 2015 Dec;8(12):1495-515. doi: 10.1242/dmm.022103.

      Reviewer #2 (Public review):

      Summary:

      Saijilafu et al. demonstrate that MLCK/MLCP proteins promote axonal regeneration in both the central nervous system (CNS) and peripheral nervous system (PNS) using primary cultures of adult DRG neurons, hippocampal and cortical neurons, as well as in vivo experiments involving sciatic nerve injury, spinal cord injury, and optic nerve crush. The authors show that axon regrowth is possible across different contexts through genetic and pharmacological manipulation of these proteins. Additionally, they propose that MLCK/MLCP may regulate F-actin reorganization in the growth cone, which is significant as it suggests a novel strategy for promoting axonal regeneration.

      Strengths:

      This manuscript presents a comprehensive array of experimental models, addressing the biological question in a broad manner. Particularly noteworthy is the use of multiple in vivo models, which significantly strengthens the overall validity of the study.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      Weaknesses:

      The following aspects apply:

      (1) The manuscript initially references prior research by the authors suggesting that NMII inhibition enhances axonal growth and that MLCK activates NMII. However, the study introduces a contradiction by demonstrating that MLCK inhibition (via ML-7 or siMLCK) inhibits axonal growth. This inconsistency is not adequately addressed or discussed in the manuscript.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line373-374).

      (2) While the study proposes that MLCK/MLCP regulates F-actin redistribution in the growth cone, the mechanism is not explored in depth. The only figure showing how pharmacological manipulation affects the growth cone suggests that not only F-actin but also the microtubule cytoskeleton might be affected, indicating that the mechanism may not be specific. A deeper exploration of this relationship in DRG neurons, in addition to cortical neurons, as shown in the study, would be beneficial.

      Thank you for your insightful suggestion. However, our study primarily focuses on actin and myosin dynamics in the context of axonal elongation, as indicated by our direct observations in growing dorsal root ganglia (DRGs). Athamneh et al. (2017) elegantly demonstrated that the bulk movement of microtubules (MTs), rather than their assembly, predominantly drives MT advance during axonal elongation. Consequently, our manuscript concentrates on the actomyosin system, which is central to our findings. While the role of MTs in axonal growth is indeed significant and fascinating, the data we present is predominantly concerned with the actomyosin mechanism.

      [Ref] Athamneh, A. I. M. et al. Neurite elongation is highly correlated with bulk forward translocation of microtubules. Scientific Reports 7, (2017).

      (3) In the sciatic nerve injury experiments, it would be crucial to include additional controls that clearly demonstrate that siMYPT1 treatment increases MLCP in the L4-L5 ganglia. Additionally, although the manuscript mentions quantifying axons expressing EGFP, the Materials and Methods section only discusses siMYPT1 electroporation, which could lead to confusion.

      Thank you for your suggestion. However, due to the unavailability of a suitable commercial MLCP antibody, we were unable to directly detect MLCP expression. Instead, we assessed the phosphorylation level of myosin light chain (MLC) as a proxy to indicate that siMYPT1 transfection effectively downregulates MLCP activity in L4/5 dorsal root ganglia (DRG). This approach was taken to ensure the integrity of our findings despite the limitations in antibody availability.

      About the electroporation method section, we have now included detailed information about the control plasmid used in our experiments to ensure a clear understanding of our experimental setup and to validate our results. A 1 μl solution containing indicated siRNAs together with the plasmid encoding EGFP (pCMV–EGFP–N3) was then microinjected into the L4–L5 DRG….. (line 152-153).

      (4) In some panels, it is difficult to differentiate the somas from the background (Figures 3, 4, 7). In conditions where images with shorter axonal lengths are represented, it is unclear whether this is due to fewer cells or reduced axonal growth (Figures 2, 4, 6).

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of typos and language errors that should be thoroughly revised. For example, line 219: "It is well known that the opposite role of MLCK and MLCP to regulate the MLC phosphorylation status". The term "opposite role" is vague. Using "opposite roles" and specifying that they are in regulating MLC phosphorylation status clarifies the relationship between MLCK and MLCP. Also, the original phrase "to regulate" was not correctly integrated into the sentence. Rephrasing it to "in regulating" makes the role of MLCK and MLCP clearer.

      We have revised the manuscript as suggested by the reviewer (line 229).

      In the same line, there is a high number of panels that are not referred to in the text or references for panels that have another letter. Just to mention a few:

      - line 199: "(Figure 1F, G)", → BUT figure 1 contains no G panel.

      We have revised the manuscript as suggested by the reviewer (line 209).

      - line 203: "The results showed that ML-7 administration led to a significant reduction in MLC phosphorylation levels (Figure 2A, B) and impaired axonal growth in sensory neurons (Figure 2C, D). → BUT panel C is related to A and B, and only D and E show impaired axonal growth.

      We have revised the manuscript as suggested by the reviewer (line 214; line 215; line 217; line 219 ).

      Reviewer #2 (Recommendations for the authors):

      (1) Improving the quality of the images would significantly strengthen the results presented.

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      (2) The representative images of controls do not always show the same number of cells or axonal growth (e.g., Figure 4).

      We have changed some images as suggested by the reviewer.

      (3) The text has citation errors when referring to the figure labels.

      Upon thorough review, we have carefully examined our manuscript and have made the necessary corrections to address the identified errors. We appreciate the opportunity to enhance the quality of our work and believe that these revisions have significantly improved the clarity of our manuscript.

      (4) What happens to MLCK levels when MLCP activity is inhibited in the optic nerve?

      Upon analyzing our experimental data, we observed no significant alterations in the protein levels of MLCK when the activity of MLCP was inhibited. This finding suggests that the regulatory mechanisms governing MLCK expression may not be directly influenced by short-term MLCP inhibition. It is plausible that the duration of the inhibition period was insufficient to elicit a detectable change in MLCK expression levels.

      (5) The text in line 266: "In contrast, local PBS administration at the injury site or intravitreal PDBu injection induced little axon regeneration beyond the injury site (Figure 5 A-C)." However, this is not reflected in the figure.

      In our revised manuscript, we have provided a more precise description of our findings: In contrast, local PBS administration at the injury site or intravitreal PDBu injection did not significantly enhance axon regeneration beyond the injury site (Figure 5 A-C). This observation suggests that the only treatment employed in the injury site (the inhibition of MLCP activity within the growth cone) effective promote axonal growth. (line 276-279).

      (6) Line 287: The phrase "Consistent with our previous study" requires a citation to support it.

      We added the reference paper; Consistent with our previous study 1, the inhibition of myosin II activity with 25 μM blebbistatin markedly promoted axonal growth (Figure 6A, B). (line 298)

      (7) Line 333: The paper cited by Yu P et al. (2012) does not mention MLCK or p-MLC, so it appears to be misquoted.

      Thank you for comments. We rechecked this cited paper and confirmed that the author provided the western data C in the supplementary figure 1, it showed that Bleb did not alter the phosphorylation status of MLC.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work by Chuong et al. provides important new insights into the contribution of different molecular mechanisms in the dynamics of CNV formation. It will be of interest to anyone curious about genome architecture and evolution from yeast biologists to cancer researchers studying genome rearrangements.

      Thank you for recognizing the broad significance of our study.

      Strengths:

      Their results are especially striking in that the "simplest" mechanism of GAP1 amplification-non-allelic homologous recombination between the flanking Ty-LTR elements is not the most common route taken by the cells, emphasizing the importance of experimentally testing what might seem on the surface to be obvious answers. One of the important developments of their work is the use of their neural network simulation-based inference (nnSBI) model to derive rates of amplicon formation and their fitness effects.

      We agree with this assessment as the results of our study challenge our intuition that the simplest path to structural variation is the most likely and reveals the great diversity in mechanisms that can lead to large scale changes in the genome.

      Weaknesses:

      The manuscript reads as though two different people wrote two different sections of the manuscript - an experimental evolutionist and a computational scientist. If the goal is to reach both groups of readers, there needs to be more explanation of both types of work. I found the computational sections to be particularly dense but even the experimental sections need clearer explanations and more specific examples of the rearrangements found. I will point out these areas in the detailed remarks to the authors. While I have no reason to question their conclusions, I couldn't independently verify the results that ODIRA was the majority mechanism since the sequence of amplified clones was not made available during the review. I've encouraged the authors to include specific, detailed sequence information for both ODIRA events as well as the specific clones where GAP1 was amplified but the flanking gene GFP was not.

      We have revised the manuscript to expand explanations of both the experimental and computational aspects of our study and to provide additional information for the reader. In doing so, we have edited the text to improve readability. We have made all raw data publicly available through the NCBI short read archive (SRA) and are hosting all sequence data for easy visualization in JBrowse using a public server.

      Reviewer #2 (Public Review):

      Summary:

      This study examines how local DNA features around the amino acid permease gene GAP1 influence adaptation to glutamine-limited conditions through changes in GAP1 Copy Number Variation (CNV). The study is well motivated by the observation of numerous CNVs documented in many organisms, but difficulty in distinguishing the mechanisms by which they are formed, and whether or how local genomic elements influence their formation. The main finding is convincing and is that a nearby Autonomous Replicating Sequence (ARS) influences the formation of GAP1 CNVs and this is consistent with a predominate mechanism of Origin Dependent Inverted Repeat Amplification (ODIRA). These results along with finding and characterizing other mechanisms of GAP1 CNV formation will be of general interest to those studying CNVs in natural systems, experimental evolution, and in tumor evolution. While the results are limited to a single CNV of interest (GAP1), the carefully controlled experimental design and quantification of CNV formation will provide a useful guide to studying other CNVs and CNVs in other organisms.

      Thank you for this positive assessment of our study.

      Strengths:

      The study was designed to examine the effects of two flanking genomic features next to GAP1 on CNV formation and adaptation during experimental evolution. This was accomplished by removing two Long Terminal Repeats (LTRs), removing a downstream ARS, and removing both LTRs and the ARS. Although there was some heterogeneity among replicates, later shown to include the size and breakpoints of the CNV and the presence of an unmarked CNV, both marker-assisted tracking of CNV formation and modeling of CNV rate and fitness effects showed that deletion of the ARS caused a clear difference compared to the control and the LTR deletion.

      The consequence of deletion of local features (LTR and ARS) was quantified by genome sequencing of adaptive clones to identify the CNV size, copy number and infer the mechanism of CNV formation. This greatly added value to the study as it showed that i) ODIRA was the most common mechanism but ODIRA is enhanced by a local ARS, ii) non-allelic homologous recombination (NAHR) is also used but depends on LTRs, and iii) de novo insertion of transposable elements mediate NAHR in strains with both ARS and LTR deletions. Together, these results show how local features influence the mechanism of CNV formation, but also how alternative mechanisms can substitute when primary ones are unavailable.

      We agree with this assessment.

      Weaknesses:

      The CNV mutation rate and its effect on fitness are hard to disentangle. The frequency of the amplified GFP provides information about mutation rate differences as well as fitness differences. The data and analysis show that each evolved population has multiple GAP1 CNV lineages within it, with some being unmarked by GFP. Thus, estimates of CNV fitness are more of a composite view of all CNV amplifications increasing in frequency during adaptation. Another unknown but potential complication is whether the local (ARS, LTR) deletions influence GAP1 expression and thus the fitness gain of GAP1 CNVs. The neural network simulation-based inference does a good job at estimating both mutation rates and fitness effects, while also accounting for unmarked CNVs. However, the model does not account for the population heterogeneity of CNVs and their fitness effects. Despite these limitations of distinguishing mutation rate and fitness differences, the authors' conclusions are well supported in that the LTR and ARS deletions have a clear impact on the CNV-mediated evolutionary outcome and the mechanism of CNV formation.

      While it is true that the inferred mutation rate and fitness effect are negatively correlated, as in other studies (Gitschlag et al., 2023; Caspi et al., 2023; Avecilla et al., 2022), our modeling approach does generate an estimate of each parameter that is best explained by the data. By reporting the confidence intervals (i.e. the 95% HDI) we define the set of parameter values that are consistent with the data. It is true that our model doesn't explicitly account for population heterogeneity; rather, following Hegreness et al. (2006), we employ a single effective fitness effect and mutation rate for all GAP1 CNVs. It is interesting to consider whether the ARS and LTR affect GAP1 expression; however, we have no evidence that this is the case.

      Reviewer #3 (Public Review):

      Summary:

      The authors represent an elegant and detailed investigation into the role of cis-elements, and therefore the underlying mechanisms, in gene dosage increase. Their most significant finding is that in their system copy number increase frequently occurs by what they call replication errors that result from the origin of replication firing.

      The authors somewhat quantitatively determine the effect of the presence of a proximal origin of replication or LTR on the different CNV scenarios.

      Strengths:

      (1) A clever and elegant experimental design.

      (2) A quantitative determination of the effect of a proximal origin of replication or LTR on the different CNV scenarios. Measuring directly the contribution of two competing elements.

      (3) ODIRA can occur by firing of a distal ARS element.

      (4) Re-insertion of Ty elements is interesting.

      We agree that these are interesting and novel findings from our study.

      Weaknesses:

      (1) Overall, the research does not considerably advance the current knowledge. The research does not investigate what the maximum distance between ARS for ODIRA is to occur. This is an important point since ODIRA was previously described. A considerable contribution to the field would be to understand under what conditions ODIRA wins NAHR.

      We agree that these are important questions and they are ones that we are pursuing in future studies.

      (2) The title and some sentences in the abstract give a wrong impression of the generality and the novelty of the observations presented. Below are some examples of much earlier work that dealt with mechanisms of CNV and got different conclusions. The Lobachev lab (Cell 2006) published a different scenario years ago, with a very different mechanism (hair-pin capped breaks). The Argueso lab found something different (NAHR) (Genetics 2013).

      In fact, the CUP1 system presents a good example of this point. The Houseley group showed a complex replication transcription-based mechanism (NAR 2022, cited), the Argueso group showed Ty-based amplification and the Resnick group showed aneuploidy-based amplification. While aneuploidy is a minor factor here the numerous works in Candida albicans, Cryptococcus neoformans, and Yeast suggest otherwise (Selmecki et al Science 2006, Yona et al PNAS 2013, Yang et al Microbiology Spectrum 2021).

      As the reviewer points out there have been several important published studies investigating mechanisms by which structural variation is generated. It is important to note that we are explicitly looking at CNVs in the context of adaptive evolution and the role of genomic features that enable different mechanisms of CNV formation. To emphasize this point, we have changed the title of our manuscript to “Template switching during DNA replication is a prevalent source of adaptive gene amplification”. Aneuploidy is indeed a mechanism of adaptive gene amplification in our current and previously reported studies. We have expanded our discussion to place our study in the context of previous studies reporting mechanisms of gene amplification.

      (3) The authors added a mathematical model to their experimental data. For me, it was very difficult to understand the contribution of the model to the research. I anticipated, for example, that the model would make predictions that would be tested experimentally. For example, " ARSΔ and ALLΔ are predicted to be almost eliminated by generation 116, as the average predicted WT proportion is 0.998 and 0.999" But to my understanding without testing the model.

      In our previous publication (Avecilla et al. 2022, PLoS Biology) we experimentally validated the use of nnSBI to infer evolutionary parameters. In this study, we have extended our modeling framework to quantify differences between genotypes, which was not previously possible. Our results reveal that the local ARS has a key role in the overall supply rate of CNVs at this locus.

      Recommendations for the authors:

      We have addressed all public reviews and recommendations.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments about the work are covered in the order of appearance in the text or Figures. I apologize in advance for the number of comments. They are made out of curiosity, enthusiasm for the research, and a desire to help highlight the most interesting aspects of this work.

      We are grateful for the thoughtful comments that have helped us to significantly improve our manuscript.

      (1) I would appreciate the inclusion of several references to the work on the ODIRA model.

      a) Page 3 last paragraph: "(2) DNA replication-based mechanisms (Harel et al., 2015; Hastings, Lupski, et al., 2009; Malhotra & Sebat, 2012; Pös et al., 2021; Zhang, Gu, et al., 2009; Brewer et al., 2011)" (Addition of Brewer et al., 2011).

      We have added all suggested references.

      b) Page 4 top: (Brewer et al., 2011; Brewer et al., 2015; Martin et al., 2024). (Addition of Brewer et al., 2011).

      We have added all suggested references.

      c) Page 14 top: "Recent work has proposed that ODIRA CNVs are a major mechanism of CNVs in human genomes (Brewer et al., 2015; Martin et al., 2024; Brewer et al., 2024)." Brewer et al., 2024 focuses specifically on ODIRA and human CNVs. (Addition of Brewer et al., 2024).

      We have added all suggested references.

      (2) Page 6, third paragraph: I was surprised that a single inoculating strain was used to establish the replicate chemostats because of the possibility of non-independence of the resulting GAP1 CNVs. A nnSBI model was used to correct for this possibility later in the paper. It seems like it could have been avoided by a simple change in protocol to inoculate each chemostat with an independent inoculum. Was there a reason that the replicate chemostats were not conducted as independent events? Establishing the presence of 'founder' GAP1 CNVs without GFP seems rather secondary to the point of the paper (examining the CNVs that arise during evolution) and I would recommend it being moved to the supplement.

      As is typical in microbial experimental evolution studies, we aimed to start with genetically identical homogenous populations and observe the emergence and selection of de novo variation. Therefore, we founded independent populations from a single inoculum. However, this study, and our prior work using lineage tracking barcodes, has clearly demonstrated that during the initial growth of the culture used for the inoculum CNVs are generated that contribute to the adaptation dynamics on all derived populations. This unanticipated result now suggests that the reviewer’s suggestion is a valid one - independent populations should be derived from independent inocula and this will be our standard practice in future studies.

      We believe that our results, presented in Figure 2, establishing the presence of pre-existing GAP1 CNVs without the GFP are important as it highlights a limitation of the use of CNV reporters of gene copy number that was not previously known. However, we subsequently show that this class of variant - CNVs that are not detected by the reporter system - can be incorporated into our modeling framework enabling estimation of evolutionary parameters, which we believe is an important finding warranting inclusion in the main text.

      (3) Page 7 first full paragraph: "Finally, we also observe a significant delay (ANOVA, p = 0.00833) in the generation at which the CNV frequency reaches equilibrium in ARS∆ (~generation 112) compared to WT (pairwise t-test, adjusted p = 0.05) . . .". Is the delay in reaching a plateau in Figure 1E just a consequence of the later appearance of CNVs or do the authors believe there are two separate events responsible for this delay? E.g. if the authors think that the delay in reaching a plateau is related to lower selection coefficients of the CNVs that do arise compared to the CNVs of other strains, then this should be explicitly discussed.

      We believe that the delay in reaching equilibrium is a consequence of both a lower CNV formation and reduced selection coefficients. Lower values for the fitness coefficient and formation rate in ARS∆ explain both the delay in CNV appearance and CNV equilibrium as shown by the predicted dynamics (Figure S3B). We have added an explicit discussion of the effect of the ARS on CNV dynamics in paragraph 2 of the Discussion section paragraph 2 starting at line 456.

      (4) Page 7: Incorporating pre-existing CNVs into an evolutionary model: The rationale for how you are able to discount the formation rate of GFP-free CNVs (C-) in your model isn't clear to me. How are you able to assume that these C- events don't form after timepoint 0? Why do you assume a starting population of C- events but not a starting population of C+ events?

      We explored the possibility of modeling C- (amplifications of GAP1 without amplification of the reporter) during the evolution experiment. However, because the rate at which C- events occurs is slower than the rate at which C+ events occur (GAP1 amplifications with amplification of the reporter) we found that the effect was negligible. Importantly, the simple model is sufficient to describe the observed dynamics and thus we do not include these possible rare events.

      (5) Figure 1:

      (a) Panel B: Please put the tRNAs on the line diagrams of the four strains. I first interpreted ALLΔ as missing the tRNAs, too.

      Thank you for this suggestion. We added tRNAs to all diagrams to provide additional detail about the structure of the GAP1 locus.

      (b) Panels C, D, and E: the dark shade of the colored boxplots obscures the individual points. I recommend reducing the opacity of the box or choosing a lighter shade so that the individual points are visible on top of the box. Is the percent increase in CNVs per generation (Panel D) based on the slopes of the curves in panel B? By eye the slopes of ARS∆ and ALL∆ appear at least as steep as those of wild type and LTR∆.

      Thank you for this suggestion. We have now made the individual points visible on top of the boxplots in Figures 1C, 1D, and 1E. The lines in Figure 1B show the median value across populations per time point whereas each point in Figure 1D is the slope from linear regression using values from individual populations (data from individual populations are shown in Figure 3C).

      (6) Figure 2:

      (a) Panel A: Please remind the readers what FSC-A is measuring and label the different groups of cells in each sample. Are we supposed to assume the upper scatter in generation 8 is the pre-existing CNV variants? Are the three species at generation 50 due to 1, 2, and 3 copies of GFP? Is the new species in generation 137 further amplification of the locus? And if so, how many copies does it represent? I find it fascinating that what I assume is the 2-copy CNV (presumably a direct oriented amplicon produced by NAHR) at 50 generations is lost (out-competed by a potential inverted triplication) at later times, but I didn't find any mention of this phenomenon in the text. What do the different mutant strains look like over the same time course? Please supply supplemental figures with the flow cytometry gating and vertically aligned histograms of the GFP signal so that the peaks are more easily compared. And provide this information for each of the altered strains in supplementary materials.

      Thank you for these useful suggestions. We have added a gating legend to the figure to clearly indicate the copy-number for each subpopulation. We have edited the caption and main text to explain forward scatter (FSC-A). Raw flow cytometry plots are now provided as Supplementary figure 2 and distributions of cell-size normalized GFP signal are provided in Supplementary figure 3. Although our primary objective with Figure 2A was to show the persistence of the 1-copy GFP population the reviewer is correct that we did not highlight interesting aspects of the CNV dynamics. We have added additional text starting at line 251 to point out these features of the data.

      (b) Panel B: It would help to label the different colored boxes inside cells in Figure 2B - it took me a while to identify the white box as an unrelated adaptive mutation elsewhere in the genome. The linear arrangement of these small colored blocks seems to indicate their structural arrangement. Is that the case? And are they inverted or direct amplicons? Perhaps the authors are being agnostic at this point but it would be better if each of the blocks were separate. If there are other mutations that can explain these GFP-non-amplified survivors, were they identified in your whole genome sequencing?

      We have now included a complete legend for Figure 2B indicating that the white box reflects other beneficial mutations. We have separated this class of beneficial mutation from the GAP1 and reporter elements to reflect that they are not linked. We did not identify additional beneficial mutations but plan to pursue this question in a future project.

      (c) Panel C: Are the two sets of lines mislabeled? One would expect the "reported" CNV proportions to be lower than the total CNV proportions, not the other way around. Maybe the labels "total CNVs" and "reported CNVs" are unclear to me and I am misunderstanding what "reported" refers to. Please clarify.

      Thank you for identifying this mistake. The lines were mislabeled and have now been corrected in the revised version.

      (7) Figure 3:

      (a) A fuller discussion of panels A and B is needed. The results of panel A in particular seem like an excellent opportunity for connecting the computation to the biology. Can the authors speculate on why the ALL∆ strain has a higher CNV formation rate (𝛿c) than the ARS∆ strain? I would think that taking away one means of amplification would decrease CNV formation. Likewise, could the authors discuss why the selection coefficient (sc) for the LTR∆ strain would be the same as for the wild type? Overall, I would like to see more discussion about what these differences in formation rates and selection coefficients could mean for the types of amplicons arising in the chemostats. (In panel B I don't see the shaded area referred to in the figure legend.) A side-by-side comparison of the data in Panel A with the data shown in Supplemental Figure S3A would be instructive..

      Thank you for raising these points. We have added substantial text to the manuscript to address these findings. Starting at line 456 we state:

      “The lower CNV formation rate in the LTR∆ could be a closer approximation of ODIRA formation rates at this locus as ODIRA CNVs are the predominant CNV mechanism in the LTR∆ strain (Figure 4F). Furthermore, the low formation rates in the LTR∆ relative to WT might suggest that the presence of the flanking long terminal repeats may increase the rate of ODIRA formation through an otherwise unknown combinatorial effect of DNA replication across these flanking LTRs and template switching at the GAP1 locus. ARS∆ has the lowest CNV formation rate and it could be an approximation of the rates of NAHR between flanking LTRs and ODIRA at distal origins. We find that the ALL∆ has a higher CNV formation rate than the ARS∆, even though three elements are deleted instead of one. One explanation for this is that the deletion of the flanking LTRs in ALL∆ gives opportunity for novel transposon insertions and subsequent LTR NAHR. Indeed we find an enrichment of novel transposon-insertions in the ALL∆ (Figure 4F) and subsequent CNV formation through recombination of the Ty1-associated repeats (Figure 4H, ALL∆). Both events, transposon insertion followed by LTR NAHR, would have to occur quickly at a rate that explains our estimated CNV rate in ALL∆. While remarkable, increased transposon activity has been associated with nutrient stress (Curcio & Garfinkel, 1999; Lesage & Todeschini, 2005; Todeschini et al., 2005) and therefore feasible explanation for the CNV rate estimated in the ALL∆. Additionally, ARS∆ clones rely more on LTR NAHR to form CNVs (Figure 4F). The prevalence of ODIRA in ARS∆ and ALL∆ are similar. LTR NAHR usually occurs after double strand breaks at the long terminal repeats to give rise to CNVs (Argueso et al., 2008). Because we use haploid cells, such double strand break and homology-mediated repair would have to occur during S-phase after DNA replication with a sister chromatid repair template to form tandem duplications. Therefore the dependency on LTR NAHR to form CNVs and the spatial (breaks at LTR sequences) and temporal (S-phase) constraints could explain the lower formation rate in ARS∆.”

      In addition, we added a discussion of the different selection coefficients estimated and how the simulated competitions help us understand the decreased selection coefficients in the architecture mutants. In newly added text starting at line 479 we state:

      “The genomic elements have clear effects on the evolutionary dynamics in simulated competitive fitness experiments. The similar selection coefficients in WT and LTR∆ suggest that CNV clones formed in these background strains are similar. Indeed, the predominant CNV mechanism in both is ODIRA followed by LTR NAHR (Figure 4F). While LTR NAHR is abolished in the LTR∆, it seems that CNVs formed by ODIRA allow adaptation to glutamine-limitation similar to WT. The lower selection coefficients in ARS∆ and ALL∆ suggest that GAP1 CNVs formed in these strains have some cost. In a competition, they would get outcompeted by CNV alleles in the WT and LTR∆ background.”

      (b) The data shown in panel C seems redundant to what is shown more clearly in Supplemental Figure S3B. It seems to me the more important comparison to make in panel C would be the overlay of the predicted data to the median proportion of cells obtained from the experimental data (Figure 1B). Also, overlays of the cultures from each strain could be added to S3A. It is difficult to see the variation within each strain when the data from all four strains are superimposed as they are in Figure 3C.

      We agree and have edited Figure 3C to incorporate these suggestions and more clearly convey the intra- and interstrain variation.

      (8) Figure 4:

      (a) Panels A, B, and C are nice summaries and certainly helpful for understanding panel E, but it would be instructive to see some actual rearrangements of the ODIRA events, the NAHR, and the transposon-mediated rearrangements. It isn't clear to me what these last events look like. A figure that shows the specific architecture of example clones for each category would be helpful. I am also having a hard time reconciling ODIRA events with a copy number of 2. Are these rearrangements free isochromosomes with amplification to the telomere or are they secondary rearrangements like those described in Brewer et al., 2024? And what about the non-aneuploid rearrangement that includes the centromere? Is it a dicentric?

      We have now added more detailed depictions of CNVs in Figure 4A and provide links to visualize the alignment files. We have added additional discussion starting at line 397 of the non-canonical ODIRA events and putative neochromosome amplicons with reference to Brewer et al 2024. Starting at line 397 we state:

      “Surprisingly, we found CNVs with breakpoints consistent with ODIRA that contained only 2 copies of the amplified region, whereas ODIRA typically generates a triplication. In the absence of additional data, we cannot rule out inaccuracy in our read-depth estimates of copy numbers for these clones (ie. they have 3 copies). An alternate explanation is a secondary rearrangement of an original inverted triplication resulting in a duplication (Brewer et al., 2024); however, we did not detect evidence for secondary rearrangements in the sequencing data. A third alternate explanation is that a duplication was formed by hairpin capped double-strand break repair (Narayanan et al., 2006). Notably, we found 3 additional ODIRA clones that end in native telomeres, each of which had amplified 3 copies. In these clones the other breakpoint contains the centromere, indicating the entire right arm of chromosome XI was amplified 3 times via ODIRA, each generating supernumerary chromosomes. Thus,ODIRA can result in amplifications of large genomics regions from segmental amplifications to supernumerary chromosomes.”

      (b) In Panel B the violin plots appear to indicate that there are two size categories for amplicons in the ARS∆ strain. Do clones from these different sub-populations share a common CNV architecture?

      Thank you for making this point. (Please note that the violin plots are now Figure 4E) We added a short discussion and Supplementary Figure 14. In line 432, we state:

      “In ARS∆, we find two CNV length groups (Figure 4E) that correspond with two different CNV mechanisms (Supplementary Figure 14). 100% of smaller CNVs (6-8kb) (Supplementary Figure 14) correspond with a mechanism of NAHR between LTRs flanking the GAP1 gene (Figure 4H, ARS∆, bottom left green points). Larger CNVs (8kb-200kb) (Supplementary Figure 14) correspond with other mechanisms that tend to produce larger CNVs, including ODIRA and NAHR between one local and one distal LTR element (Figure 4H).”

      (c) Panels D and E: There is great information in these two panels but I find the color keys confusing. There doesn't seem to be any reason for the strain color key in panel E. I am assuming that the key should go with Panel D. Is there some way to indicate in Panel D which events are in which CNV category? It is cumbersome to find that information from Panel E. Perhaps the color-coding from Panel E could be applied to the row labels in Panel D. Being able to link amplicon to the mechanism of CNV formation is especially important for seeing which ODIRA events contain an origin.

      Thank you for this suggestions. We now indicate the mechanism of CNV formation using a consistent color coding in panels G and H (previously panels D and E).

      (d) Panel E: I don't understand the two axes in Panel E. If both axes are log scales, why is the origin 0 for the X-axis and 1 for the Y-axis? And why are the focal amplicons (most of which are recombination events between the two LTRs) scattered in both X and Y coordinates? Shouldn't they form a single point? The same for the recombinants with distal LTRs. Also, orange and red (ODIRA and complex CNVs, respectively) are very hard to distinguish. All of these data need to be presented in a spreadsheet identifying each clone's strain ID, chemostat number, GAP1 and GFP copy numbers, sequence across the junction, and their coordinates. The SRA project (PRJNA1016460) for the sequence data was not found in SRA. Will this data be available to easily look at read depth across chromosome XI for all of the sequenced strains - perhaps as .bam files?

      Thank you for calling these issues with data visualization to our attention. Indeed, the focal amplifications do form around a single point. We originally had jittered the data to show each individual focal amplification but agree that this is confusing. We now overlay the individual points and have altered opacity to enable visualization of individual values. The suggested table of clone data is provided in Supplementary File 2 and the SRA project is now publicly available. Moreover, we are providing all alignment (.bam) files, split, and discordant read depth profiles for each CNV strain and their corresponding ancestor aligned to our custom reference genomes in a public jbrowse server at:

      https://jbrowse.bio.nyu.edu/gresham/?data=data/ee_gap1_arch_muts for WT strains, https://jbrowse.bio.nyu.edu/gresham/LTRKO_clones for LTR∆ strains, https://jbrowse.bio.nyu.edu/gresham/ARSKO_clones for ARS∆ strains, https://jbrowse.bio.nyu.edu/gresham/ALLKO_clones for ALL∆ strains.

      (e) Supplementary Table 1 and Supplementary Figure S2: Please indicate which rearrangements (of the 8 reported in Figure S2A) were identified in each of the clones described in the table. If each of the 8 amplicons is identified by a letter, then this information could be added as a column in the table. I am assuming that each of the eight rearrangements was found in more than one chemostat. Showing these data is crucial for establishing the possibility that they were preexisting at the time of chemostat inoculation. The other possibility is that the clones with amplified GAP1 but a single copy of GFP could have been created by a secondary rearrangement in the outgrowth of the clones that originally had amplified both genes to the same extent. What is the structure of these amplicons? Is there a common junction between GAP1 and GFP? I couldn't find these data in the paper. A suggestion for Supplemental Figure S2A - include a zoomed-in inset for the GAP1 GFP region for each of the 8 read-depth plots. It is hard to see the exact location of GFP and GAP1 across all 8 tracks without getting out a ruler. Were these sequences aligned to your custom reference genome or the reference genome without GFP? If they were aligned to the custom reference that includes the GFP reporter, the reader could visually confirm the absence of GFP amplification.

      Thank you for these suggestions. We edited Supplementary Table 1 and Supplementary Figure 1A as requested. We now provide the precise CNV breakpoints in the GFP-GAP1 region (supplemental figure 1B) displaying both genome read depth and split read depth tracks. These sequences were aligned to the custom reference containing the GFP reporter, which is now clearer in the figure and caption text in line 1226.

      The clones in this figure were sampled from the five different chemostats and we have clarified this in the edited table and text at line 210. We did not detect the same CNV allele in different chemostats and therefore we do not have evidence to support GAP1 amplification without the GFP reporter pre-existing at time of inoculation. We are not able to definitively distinguish whether the amplicons were pre-existing at the time of inoculation or occurred after as we do not have barcoded lineages. We isolated clones carrying this class of amplification from the 1-GFP-copy subfraction late in the experimental evolution (generation 165-182). Given that the alleles appear to differ between populations we think the most parsimonious explanation is that these amplifications occurred after chemostat inoculation but early in the evolution experiment. We explicitly state this in the text starting in line 219.

      (9) Page 8-9: I am sorry to say that I can't evaluate the "HDI of posterior distributions". It is out of my competency range. So I am not sure what this analysis is adding to the paper. The same goes for the rest of the supplementary figures.

      HDI is a measure of certainty in an estimate, similar to confidence interval. We state this in the text in line 276. With the editing of the text we hope the modeling and its supplementary figures are more clear now.

      (10) Page 9 top: Deletion of the ARS appears to lower the fitness of the amplified GAP1 variants. Can the authors speculate on why the ARS deletion would reduce fitness? Did they consult published replication profiles to determine the size of the origin-free gap that could result from the deletion of this mid-S phase origin? Could it explain the delay in the appearance of GAP1 amplicons in the ARS-deletion strains and be responsible for their reduced selection coefficients? Did you examine the growth properties of the starting strain or any of the amplified GAP1 derivatives? Perhaps this consideration could contribute to the discussion. Could there be a bit fuller discussion on the interaction between CNV length differences as shown in Figure 4A and differences in selection coefficient as determined by the nnSBI?

      Thank you for raising this point. We have now added text to our discussion of the reduced fitness in ARS∆ in relation to DNA replication starting on line 359:

      “ARS1116 is a major origin (McGuffee et al., 2013) and ODIRA CNVs found around this origin corroborate its activity. GAP1 is highly transcribed in glutamine-limited chemostats (Airoldi et al., 2016). Head-on transcription-replication collisions at this locus may be contributing to the higher CNV formation rate in wild type and LTR∆. Elimination of the local ARS could result in less transcription-replication collisions and the slower CNV formation rates estimated. Once formed they get outcompeted by faster-forming CNVs and thus in theory are less fit than CNVs in other strain backgrounds. These simulated competitions further suggest that the ARS is a more important contributor to adaptive evolution mediated by GAP1 CNVs.”

      We examined replication profiles in McGuffee et al. Mol Cell. 2013 but could not determine the size of the origin-free gap. ARS1116 and its neighboring ARSs, ARS1118 downstream and ARS1115 upstream are efficient firing origins (Supplement 1 of McGuffee et al. 2013) and therefore the gap is likely to be minimal. The dynamics of the distal firing ARS elements involved in creating ODIRA CNVs might explain the reduced fitness, but further experiments would be required to address this. Regarding growth properties, the growth rate at steady-state in the chemostat is the same as the dilution rate regardless of strain background. Because we had the same dilution rate for each chemostat, the ARS∆ populations would have the same replication rate as the other three strains even if there may be replication rate differences in bulk culture growth. Finally, we found no significant interaction between CNV length and selection coefficients and we state this in line 359.

      (11) Page 10: WT competition simulations: It may help to explicitly state that the competition modeling approach was experimentally validated in Avecilla 2022 as opposed to just citing the paper. I found the results much more convincing after reading Avecilla 2022, but I imagine many readers may skip that.

      We added a sentence to state that the nnSBI method was experimentally validated in Avecilla et 2022 at line 249.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2: says reported CNV proportions (dashed). This may be a typo since I think the GFP reported should be solid, not dashed. Also, (C) isn't bold.

      Thank you for identifying these mistakes. We have corrected the figure’s caption in line 1157.

      (2) "compared to 898/345 clones" Does this refer to transposition/clone? Seems more natural to compare clones with transpositions to a total number of clones. This could be clarified.

      We rephrased the sentence (lines 519-520) to clarify that in their study Hays et al. 2023 found 898 novel Ty insertions across 345 nitrogen-evolved clones. As a result of this high rate of transposition, some clones are expected to have multiple Ty insertions.

      (3) The methods state that Kan replaces the Nat cassette that was used to make the deletions. It should be made more clear whether Kan is present and where Kan is with respect to GFP and GAP1.

      Thank you for pointing this out. To clarify we added the following sentence to the methods starting in line 567:

      “The CNV reporter is 3.1 kb and located 1117 nucleotides upstream of the GAP1 coding sequence. It consists of, in the following order, an ACT1 promoter, mCitrine (GFP) coding sequence, ADH1 terminator, and kanamycin cassette under control of a TEF promoter and terminator.”

      Additionally in line 571 we clarify the drug resistance of the genomic architecture ∆ strains that are kanamycin(+) and nourseothricin(-).

      Reviewer #3 (Recommendations For The Authors):

      (1) The major advancement of the manuscript is stated in the title "DNA replication errors are a major source of adaptive gene amplification" First, in my humble opinion the term replication errors is not quite right; the term template switching is more accurate. In that regard, recently a paper was published just on this topic (Martin et al Plos Genetics, 2024).

      We have changed the title to “Template-switching during DNA replication is a prevalent source of adaptive gene amplification”. We cite Martin et al Plos Genetics 2024 throughout the main text in lines 93, 126, 159, 502, 555.

      (2) I find the statement "We find that 49% of all GAP1 CNVs are mediated by the DNA replication-based mechanism Origin Dependent Inverted Repeat Amplification (ODIRA) regardless of background strain." Somewhat misleading, there were considerable differences between the strains. If I am not mistaken the range was 20-80%.

      Thank you for pointing this out. Indeed, the range was 26-80% across the four strains. We updated this sentence in the abstract at line 40, and in the main text at line 141 to clearly state the range.

      (3) In their attempt to fill the gap of knowledge regarding the fitness effect of the adaptive CNV the authors use a mathematical model. As an experimental biologist, I found the description lacking. It is hard for me to evaluate the contribution of the model to understanding the results and I think the authors could improve this part.

      We have edited the text regarding the modeling and associated results and hope that it is now more clear. The mathematical model describes the experiment in a simplified manner. We use it to predict the outcomes of additional experiments without additional experimental work. For example, we used it to simulate a competition between two strains, predict the total proportion of GAP1 CNVs, and predict the relative genetic diversity.

      (4) Experiments the authors may want to consider to increase the novelty of their work:

      a) Place the GAP1 gene right in the middle of the two most distant ARS elements and test the mechanism of CNV.

      Thank you for this proposed experiment. It is beyond the scope of this paper and will be pursued in future studies.

      b) The finding of de-novo Ty element insertion is interesting. What happens if the overdose strain of Jef Boeke is used (Retrotransposon overdose and genome integrity, PNAS 2009) or in contrast, a reverse transcriptase deficient strain?

      We agree. Our study has revealed a critical role for novel Ty insertion in mediating CNVs. The suggested experiments as well as using strains that lack Ty sequences will be very interesting to explore in followup studies.

      c) The genomic analyses were based on single colony isolates. To my understanding, the CNV events are identified at least partly by split reads. Therefore, each event may have a "signature" that is unique and can be concluded from single reads and not necessarily from the assembled genome. If true, a distinction between the scenarios could be achieved if bulk cultures are sequenced with enough depth. Thus, a truly dynamic and quantitative determination of the different events, rate of appearance, and disappearance can be made.

      Thank you for this suggestion, which is a good idea but not currently feasible for several reasons. First, although split reads are a powerful way to detect CNV breakpoints, we have found that even at high coverage (21-153X, median 78.5X), in clonal samples that are rare with only 3-30 split reads (median 14) detected. These observations are from a total of 23 breakpoints across 16 sequenced clones. Thus, when sequencing heterogeneous cultures, in which different CNVs only comprise a fraction of the population, our ability to detect single CNV alleles by split reads and quantify their frequency is limited. Given our observations, with a median of 14 split reads when sequencing to 78.5X genome-wide read coverage it is possible we may be able to detect an individual CNV allele once it makes up (14/78.5) 17% of the population. However, our previous study has shown that there are tens to hundreds of unique CNV alleles initially and thus this would only be feasible at very late timepoints. Second, recurrent CNVs may occur independently at the same exact location, such as LTR NAHR. Thus, unique signatures may not be obtained even if they are independent events. Third, it would be not appropriate to pursue this analysis with our current dataset, as we lack lineage tracking barcodes to validate the results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors sometimes seem to equivocate on to what extent they view their model as a neural (as opposed to merely behavioral) description. For example, they introduce their paper by citing work that views heterogeneity in strategy as the result of "relatively independent, separable circuits that are conceptualized as supporting distinct strategies, each potentially competing for control." The HMM, of course, also relates to internal states of the animal. Therefore, the reader might come away with the impression that the MoA-HMM is literally trying to model dynamic, competing controllers in the brain (e.g. basal ganglia vs. frontal cortex), as opposed to giving a descriptive account of their emergent behavior. If the former is really the intended interpretation, the authors should say more about how they think the weighting/arbitration mechanism between alternative strategies is implemented, and how it can be modulated over time. If not, they should make this clearer.

      The MoA-HMM is meant to be descriptive in identifying behaviorally distinct strategies. Our intention in connecting it with a “mixture-of-strategies” view of the brain is that the results of the MoA-HMM could be indicative of an underlying arbitration process, but not modeling that process per se, that can be used to test neural hypotheses driven by this idea. We’ve added additional clarification in the discussion to highlight this point.

      Explicitly, we added the following sentence in the discussion: “For example, while the MoA-HMM itself is a descriptive model of behavior and is not explicitly modeling an underlying arbitration of controllers in the brain, the resulting behavioral states may be indicative of underlying neural processes and help identify times when different neural controllers are prevailing”

      Second, while the authors demonstrate that model recovery recapitulates the weight dynamics and action values (Fig. 3), the actual parameters that are recovered are less precise (Fig. 3 Supplement 1). The authors should comment on how this might affect their later inferences from behavioral data. Furthermore, it would be better to quantify using the R^2 score between simulated and recovered, rather than the Pearson correlation (r), which doesn't enforce unity slope and zero intercept (i.e. the line that is plotted), and so will tend to exaggerate the strength of parameter recovery.

      In the methods section, we noted that the interaction between parameters can cause the recovery of randomly drawn parameter sets to fail, as seen in Figure 3 Supplement 1. This is because there are parameter regimes (specifically when a softmax temperature is near zero) which causes choices to be random, and therefore other parameters no longer matter. To address this, we included a second supplemental figure, Figure 3 Supplement 2, where we recovered model parameters from data simulated solely from models inferred from the behavioral data. Recovery of these models is much more precise, which credits our later inferences from the behavioral data.

      To make this point clearer, we changed the reference to Figure 3 Supplements 1 & 2 to: “(Figure 3 – figure supplement 1 for recovery of randomized parameters with noted limitations, and figure supplement 2 for recovery of models fit to real data)” We additionally added the following to the Figure 3 Supplement 1 caption: “Due to the interaction between different model parameters (e.g. a small 𝛽 weight will affect the recoverability of the agent’s learning rate 𝛼), a number of “failures” can be seen.”

      Furthermore, we added an R^2 score that enforces unity slope and zero intercept alongside the Pearson correlation coefficient for more comprehensive metrics of recovery. The R^2 scores are plotted on both Figure 3 Supplements 1 & 2 as “R2”, and the following text was added in both captions: “"r" is the Pearson's correlation coefficient between the simulated and recovered parameters, and "R2" is the coefficient of determination, R2, calculating how well the simulated parameters predict the recovered parameters.”

      Finally, the authors are very aware of the difficulties associated with long-timescale (minutes) correlations with neural activity, including both satiety and electrode drift, so they do attempt to control for this using a third-order polynomial as a time regressor as well as interaction terms (Fig. 7 Supplement 1). However, on net there does not appear to be any significant difference between the permutation-corrected CPDs computed for states 2 and 3 across all neurons (Fig. 7D). This stands in contrast to the claim that "the modulation of the reward effect can also be seen between states 2 and 3 - state 2, on average, sees a higher modulation to reward that lasts significantly longer than modulation in state 3," which might be true for the neuron in Fig. 7C, but is never quantified. Thus, while I am convinced state modulation exists for model-based (MBr) outcome value (Fig. 7A-B), I'm not convinced that these more gradual shifts can be isolated by the MoA-HMM model, which is important to keep in mind for anyone looking to apply this model to their own data.

      We agree with the reviewers that our initial test of CPD significance was not sufficient to support the claims we made about state differences, especially for Figure 7D. To address this, we updated the significance test and indicators in Figure 7B,D to instead signify when there is a significant difference between state CPDs. This updated test supports a small, but significant difference in early post-outcome reward modulation between states 2 and 3.

      We clarified and updated the significance test in the methods with the following text:

      “A CPD (for a particular predictor in a particular state in a particular time bin) was considered significant if that CPD computed using the true dataset was greater than 95% of corresponding CPDs (same predictor, same state, same time bin) computed using these permuted sessions. For display, we subtract the average permuted session CPD from the true CPD in order to allow meaningful comparison to 0.

      To test whether neural coding of a particular predictor in a particular time bin significantly differed according to HMM state, we used a similar test. For each CPD that was significant according to the above test, we computed the difference between that CPD and the CPD for the same predictor and time bin in the other HMM states. We compare this difference to the corresponding differences in the circularly permuted sessions (same predictor, time bin, and pair of HMM states). We consider this difference to be significant if the difference in the true dataset is greater than 95% of the CPD differences computed from the permuted sessions.”

      We updated the significance indicators above the panels in Figure 7B,D (colored points) to refer to significant differences between states, with additional text to the left of each row of points to specify the tested state and which states it is significantly greater than. We updated the figure caption for both B and D to reflect these changes.

      We also changed text in the results to focus on significant differences between states. Specifically, we replaced the sentence “Looking at the CPD of expected outcome value split by state (Figure 7B) reveals that the trend from the example neuron is consistent across the population of OFC units, where state 2 shows the greatest CPD.” with the sentence “Looking at the CPD of expected outcome value split by state (Figure 7B) reveals that the trend from the example neuron is consistent across the population of OFC units, where state 2 has a significantly greater CPD than states 1 and 3.”

      We also replaced the sentence “Suggestively, the modulation of the reward effect can also be seen between states 2 and 3 – state 2, on average, sees a higher modulation to reward that lasts significantly longer than modulation in state 3.” with the sentence “Additionally, the modulation of the reward effect can also be seen between states 2 and 3 — immediately after outcome, we see a small but significantly higher modulation to reward during state 2 than during state 3.”

      Reviewer #2 (Public Review):

      There were a lot of typos and some figures were mis-referenced in the text and figure legends.

      We apologize for the numerous typos and errors in the text and are grateful for the assistance in identifying many of them. We have taken another thorough pass through the manuscript to address those identified by the reviewer as well as fix additional errors. To reduce redundancy, we’ll address all typoand error-related suggestions from both reviewers here.

      ● We fixed all Figure 1 references. We additionally reversed the introduction order of the agents in Figure 1 and in the results section “Reinforcement learning in the rat two-step task”, where we introduce both model-free agents before both model-based agents. This is to make the model-based choice agent description (which references the model-free choice agent in the statement “That is, like MFc, this agent tends to repeat or switch choices regardless of reward”) come after introducing the model-free choice agent.

      ● We fixed all Figure 4 references.

      ● We fixed all Figure 6 references and fixed the panel references in the figure caption to match the figure labeling: Starting with panel B, the reference to (i) was removed, and the reference to (ii) was updated to C. The previous reference to C was updated to D.

      ● All line-numbered suggestions were addressed.

      ● The text “(move to supplement?)” was removed from the methods heading, and the mistaken reference to Q_MBr was fixed.

      ● We removed all “SR” acronyms from the statistics as it was an artifact from an earlier draft.

      ● We homogenized notation in Figure 2, replacing all “c” variable references with “y”, as well as homogenized notation of β

      ● We replaced many uses of the word “action” with the word “choice” for consistency throughout the manuscript.

      ● We addressed many additional minor errors

      Reviewer #1 (Recommendations For The Authors):

      (1) Could the authors comment on why the cross-validated accuracy continues to increase, albeit non-significantly, after four states, as opposed to decreasing (as I would naively expect would be the result due to overfitting)?

      Due to the large amounts of trials and sessions obtained from each rat (often >100 sessions with >200 trials per session) and the limited number of training iterations (capped at 300 iterations), it is not guaranteed that the cross-validated accuracy would decrease over the range of states we included in Figure 4, especially given that the number of total parameters in the largest model shown (7-states, 95 parameters) is greatly less than the number of observations. Since we’re mainly interested in using this tool to identify interpretable, consistent structure across animals, we did not focus on interpreting the regime of larger models.

      (2) It seems like the model was refit multiple times with different priors ("Estimation of Population Prior"), each derived from the previous step of fitting. I'm not very familiar with fitting these kinds of models. Is this standard practice? It gives off the feeling of double-dipping. It would be helpful if the authors could cite some relevant literature here or further justify their choices.

      We adopted a “one-step” hierarchical approach, where we estimate the population prior a single time on (nearly) unconstrained model fits, and use it for a second, final round of model fits which were used for analysis. Since the prior is only estimated once, in practice there isn’t risk of converging on an overly constrained prior. This is a somewhat simplified approach motivated by analogy to the first step of EM fit in a hierarchical model, in which population- and subject-level parameters are iteratively re-estimated in terms of one another until convergence (Huys et al., 2012; Daw 2010). We have clarified this approach in the methods with citations by adding the following paragraph:

      “Hierarchical modeling gives a better estimate of how model parameters can vary within a population by additionally inferring the population distribution over which individuals are likely drawn (Daw, 2011). This type of modeling, however, is notoriously difficult in HMMs; therefore, as a compromise, we adopt a “one-step” hierarchical model, where we estimate population parameters from “unconstrained” fits on the data, which are then used as a prior to regularize the final model fits. This approach is motivated by analogy to the first step of EM fit in a hierarchical model, in which population- and subject-level parameters are iteratively re-estimated in terms of one another until convergence (Daw, 2011; Huys et al., 2012). It is important to emphasize, since we aren’t inferring the population distributions directly, that we only estimate the population prior a single time on the “unconstrained” fits as follows.”

      Reviewer #2 (Recommendations For The Authors):

      Figure 3a.iii: Did the model capture the transition probabilities correctly as well?

      We have updated Figure 3E to include additional panels (iii) and (iv) to show the recovered initial state probabilities and transition matrix.

      For Figure 6, panel B makes it look like there is a larger influence of state on ITI rate after omission, in both the top and bottom plots. However, the violin plots in panel C show a different pattern, where state has a greater effect on ITIs following rewarded trials. Is it that the example in panel B is not representative of the population, or am I misinterpreting?

      We thank the reviewer for catching this issue, as the colors were erroneously flipped in panel C. We have fixed this figure by ensuring that the colors appropriately matched the trial type (reward or omission). Additionally, we updated the colors in B and C that correspond to reward (previously gray, now blue) and omission (previously gold, now red) trials to match the color scheme used in Figure 1. We also inverted the corresponding line styles (reward changed to solid, omission changed to dashed) to match the convention used in Figure 7. To differentiate from the reward/omission color changed, we additionally changed the colors in Figure 6D and Figure 7 Supplement 1, where the color for “time” was changed from blue to gray, and the color for “state” was changed from red to gold.

      For figure 4B right, I am confused. The legend says that this is the change in model performance relative to a model with one fewer state. But the y-axis says it's the change from the single-state model. Please clarify.

      The plot is showing the increase in performance from the single-state model, while the significance tests were done between consecutive numbered states. We updated the significance indicators on the plot to more clearly identify that adjacent models are being compared (with the exception of the 2-state model, which is being compared to 0). We updated the Figure 4B caption text for the left panel to state: “Change in normalized, cross-validated likelihood when adding additional hidden states into the MoA-HMM, relative to the single-state model. Significant changes are computed with respect to models with one fewer states (e.g. 2-state vs 1-state, 3-state vs 2-state)”

    1. Author response:

      Thank you for reviewing our manuscript and providing constructive feedback. We are grateful that you recognize the importance of our work and find the evidences presented compelling. We will revise our manuscripts in accordance with reviewers’ recommendations. Below is our plan.

      (1) As recommended by Reviewer 1, we will improve the image resolution and presentation in the figures, by adjusting dark colors into brighter ones, including single-channel images, and incorporating schematic illustrations to dipict morphological changes.

      (2) Following the suggestions of reviewer 2, we will provide explanations and speculative insights into potential non-tissue autonomous effects.

      (3) As suggested by reviewer 2, we will perform principal component analyses on our RNA-seq and Cut&Tag data. 

      (2) Once we have addressed all the major and minor points raised by the reviewers, we will provide a detailed point-to-point response and submit the revised version of the manuscript.

    1. Author response:

      Responses to Editors:

      We appreciate Reviewer 1’s first concern regarding the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process—particularly due to the close temporal and spatial proximity of the stimulation windows and the potential for prolonged disruption. We would like to provide clarification and evidence supporting the validity of our methodology.

      Our previous study (Zhao et al., 2021, J. Neurosci) employed the same experimental protocol—using inhibitory double-pulse transcranial magnetic stimulation (TMS) over the inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) in one of eight 40-ms time windows. The findings from that study demonstrated a time-window-selective disruption of the semantic congruency effect (i.e., reaction time costs driven by semantic conflict), with no significant modulation of the gender congruency effect (i.e., reaction time costs due to gender conflict). This result establishes that double-pulse TMS provides sufficient temporal precision to independently target the left IFG and pMTG within these 40-ms windows during gesture-speech integration. Importantly, by comparing the distinctively inhibited time windows for IFG and pMTG, we offered clear evidence of distinct engagement and temporal dynamics between these regions during different stages of gesture-speech semantic processing.

      Furthermore, we reviewed prior studies utilizing double-pulse TMS on structurally and functionally connected brain regions to explore neural contributions across timescales as brief as 3–60 ms. These studies, which encompass areas from the tongue and lip areas of the primary motor cortex (M1) to high-level semantic regions such as the pMTG and ATL (Author response table 1), consistently demonstrate the methodological rigor and precision of double-pulse TMS in disentangling the neural dynamics of different regions within these short temporal windows.

      Author response table 1.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      Response to Reviewer #1:

      (1) For concern on the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process:

      We trust that the explanation provided above has clarified this issue.

      (2) For concern on the rationale for delivering HD-tDCS/TMS in set time windows for each region, as well as how these time windows were determined and how the current results compare to our previous studies from 2018 and 2023:

      The current study builds on a series of investigations that systematically examined the temporal and spatial dynamics of gesture-speech integration. In our earlier work (Zhao et al., 2018, J. Neurosci), we demonstrated that interrupting neural activity in the IFG or pMTG using TMS selectively disrupted the semantic congruency effect (reaction time costs due to semantic incongruence), without affecting the gender congruency effect (reaction time costs due to gender incongruence). These findings identified the IFG and pMTG as critical hubs for gesture-speech integration. This informed the brain regions selected for subsequent studies.

      In Zhao et al. (2021, J. Neurosci), we employed a double-pulse TMS protocol, delivering stimulation within one of eight 40-ms time windows, to further examine the temporal involvement of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, confirming the dynamic and temporally staged roles of these regions during gesture-speech integration.

      In Zhao et al. (2023, Frontiers in Psychology), we investigated the semantic predictive role of gestures relative to speech by comparing two experimental conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. We observed time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG only in the second condition, leading to the conclusion that gestures exert a semantic priming effect on co-occurring speech. These findings underscored the semantic advantage of gesture in facilitating speech integration, further refining our understanding of the temporal and functional interplay between these modalities.

      The design of the current study—including the choice of brain regions and time windows—was directly informed by these prior findings. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (mutual information, MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and event-related potentials (ERPs) to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.

      We acknowledge that the rationale for the design of the current study was not fully articulated in the original manuscript. In the revised version, we will provide a more comprehensive and coherent explanation of the logic behind the three experiments, ensuring clear alignment with our previous findings.

      (3) For concern about the use of Pearson correlation and the normality of EEG data.

      We appreciate the reviewer’s thoughtful consideration. In Figure 5 of the manuscript, we have already included normal distribution curves that illustrate the relationships between the average ERP amplitudes within each ROI or elicited clusters and the three information models. Additionally, multiple comparisons were addressed using FDR correction, as outlined in the manuscript.

      To further clarify the data, we will calculate the Shapiro-Wilk test, a widely accepted method for assessing bivariate normality, for both the MI/entropy and averaged ERP data. The corresponding p-values will be provided in the following-up point-to-point responses.

      (4) For concern about the ROI selection, and the suggestion of using whole-brain electrodes to build models of different variables (MI/entropy) to predict neural responses:

      For the EEG data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work (Habets et al., 2011), and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons. The latter method complements the hypothesis-driven ROI analysis by offering an exploratory, unbiased perspective. Notably, the results from both approaches were consistent, reinforcing the reliability of our findings.

      To make the methods more accessible to a broader audience, we will provide a clear description of the methods used and how they relate to each other in the revised manuscript.

      Reference:

      Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462

      (5) For concern about the median split of the data:

      To identify ERP components or spatiotemporal clusters that demonstrated significant semantic differences, we split each model into higher and lower halves, focusing on indexing information changes reflected by entropy or mutual information (MI). To illustrate the gradual activation process, the identified components and clusters were further analyzed for correlations with each information matrix. Remarkably, consistent results were observed between the ERP components and clusters, providing robust evidence that semantic information conveyed through gestures and speech significantly influenced the amplitude of these components or clusters. Moreover, the semantic information was shown to be highly sensitive, varying in tandem with these amplitude changes.

      We acknowledge that the rationale behind this approach may not have been sufficiently clear in the initial manuscript. In our revision, we will ensure a more detailed and precise explanation to enhance the clarity and coherence of this logical framework.

      Response to Reviewer #2:

      We greatly appreciate Reviewer2 ’s concern regarding whether "mutual information" adequately captures the interplay between the meanings of speech and gesture. We would like to clarify that the materials used in the present study involved gestures performed without actual objects, paired with verbs that precisely describe the corresponding actions. For example, a hammering gesture was paired with the verb “hammer”, and a cutting gesture was paired with the verb “cut”. In this design, all gestures conveyed redundant meaning relative to the co-occurring speech, creating significant overlap between the information derived from speech alone and that from gesture alone.

      We understand the reviewer’s concern about cases where gestures and speech may provide complementary rather than redundant information. To address this, we have developed an alternative metric for quantifying information gains contributed by supplementary multisensory cues, which will be explored in a subsequent study. However, for the present study, we believe that the observed overlap in information serves as an indicator of the degree of multisensory convergence, a central focus of our investigation.

      Regarding the reviewer’s concern about how the neural processes of speech-gesture integration may change with variations in the relative timing between speech and gesture stimuli, we would like to highlight findings from our previous study (Zhao, 2023, Frontiers in Psychology). In that study, we explored the semantic predictive role of gestures relative to speech under two conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech of its semantic identification point. Interestingly, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures play a semantic priming role for co-occurring speech. Building on this, we designed the present study with gestures preceding speech of its semantic identification point to reflect this semantic priming relationship. Additionally, ongoing research is exploring gesture and speech interactions in natural conversational settings to investigate whether the neural processes identified here are consistent across varying contexts.

      To prevent any similar concerns from causing doubt among the audience and to ensure clarity regarding the follow-up study, we will provide a detailed discussion of the two issues in the revised manuscript.

      Response to Reviewer #3:

      The primary aim of this study is to investigate whether the degree of activity in the established integration hubs, IFG and pMTG, is influenced by the information provided by gesture-speech modalities and/or their interactions. While we provided evidence for the differential involvement of the IFG and pMTG by delineating their dynamic engagement across distinct time windows of gesture-speech integration and associating these patterns with unisensory information and their interaction, we acknowledge that the mechanisms underlying these dynamics remain open to interpretation. Specifically, whether the observed effects stem from difficulties in semantic control processes, as suggested by Reviewer 3, or from resolving information uncertainty, as quantified by entropy, falls outside the scope of the current study. Importantly, we view these two interpretations as complementary rather than mutually exclusive, as both may be contributing factors. Nonetheless, we agree that addressing this question is a compelling avenue for future research. In the revised manuscript, we will include an exploratory analysis to investigate whether the confounding difficulty, stemming from the number of lexical or semantic representations, is limited to high-entropy items. Additionally, we will address and discuss alternative interpretations.

      Regarding the concern of conceptual equivocation, we would like to emphasize that this study represents the first attempt to focus on the relationship between information quantity and neural engagement. In our initial presentation, we inadvertently conflated the commonly used term "graded hub," which refers to anatomical distribution, with its usage in the present context. We sincerely apologize for this oversight and are grateful for the reviewer’s careful critique. In the revised manuscript, we will clearly articulate the study’s objectives, clarify the representations of entropy and mutual information, and accurately describe their association with neural engagement.

      Reference

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Amemiya, T., Beck, B., Walsh, V., Gomi, H., & Haggard, P. (2017). Visual area V5/hMT+ contributes to perception of tactile motion direction: a TMS study. Scientific reports, 7(1), 40937.

      Muessgens, D., Thirugnanasambandam, N., Shitara, H., Popa, T., & Hallett, M. (2016). Dissociable roles of preSMA in motor sequence chunking and hand switching—a TMS study. Journal of Neurophysiology, 116(6), 2637-2646.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.

      Pitcher, D. (2014). Facial expression recognition takes longer in the posterior superior temporal sulcus than in the occipital face area. Journal of Neuroscience, 34(27), 9173-9177.

      Bardi, L., Kanai, R., Mapelli, D., & Walsh, V. (2012). TMS of the FEF interferes with spatial conflict. Journal of cognitive neuroscience, 24(6), 1305-1313.

      D’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2012). The role of the motor system in discriminating normal and degraded speech sounds. Cortex, 48(7), 882-887.

      Pitcher, D., Duchaine, B., Walsh, V., & Kanwisher, N. (2010). TMS evidence for feedforward and feedback mechanisms of face and body perception. Journal of Vision, 10(7), 671-671.

      Gagnon, G., Blanchet, S., Grondin, S., & Schneider, C. (2010). Paired-pulse transcranial magnetic stimulation over the dorsolateral prefrontal cortex interferes with episodic encoding and retrieval for both verbal and non-verbal materials. Brain Research, 1344, 148-158.

      Kalla, R., Muggleton, N. G., Juan, C. H., Cowey, A., & Walsh, V. (2008). The timing of the involvement of the frontal eye fields and posterior parietal cortex in visual search. Neuroreport, 19(10), 1067-1071.

      Pitcher, D., Garrido, L., Walsh, V., & Duchaine, B. C. (2008). Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. Journal of Neuroscience, 28(36), 8929-8933.

    1. Author response:

      We would like to express our sincere gratitude to both of you, and the reviewers, for the time and effort you have invested in reviewing our manuscript. We greatly appreciate the constructive feedback provided and are committed to addressing the suggested revisions.

      In response to the public reviews, we would like to outline the following plan of action:

      (1) Addressing Weaknesses in the Manuscript: We have carefully considered the comments regarding the weaknesses identified in the manuscript. Specifically, we will:

      - Provide further clarification on the mechanism of IVM resistance in our study.

      - Expand our discussion of the limitations and future directions of the research, addressing the concerns related to the potential translation of our findings to parasitic nematodes.

      (2) Additional Experiments: We are currently conducting additional experiments to address the reviewers' suggestions, which include:

      - Testing whether the overexpression of a relevant GluCl, such as AVR-15, can restore Ivermectin sensitivity in ubr-1 mutants.

      - Examining the impact of Ceftriaxone treatment on the Ivermectin resistance in worms lacking key GluCls, such as avr-15, avr-14, and glc-1.

      - Incorporating an analysis of major human parasitic nematodes in the phylogeny and discussing the conservation of relevant mechanisms across species.

      - Double-checking the Dye filling (Dyf) phenotype in ubr-1 mutants, as suggested.

      (3) Point-by-Point response: We will respond to both sets of comments (public reviews and editorial recommendations) in a comprehensive point-by-point manner in the revised manuscript.

      (4) Timely Revisions: We aim to complete all revisions within a single round, ensuring that we address all comments thoroughly while maintaining the integrity of the data.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      IPF is a disease lacking regressive therapies which has a poor prognosis, and so new therapies are needed. This ambitious phase 1 study builds on the authors' 2024 experience in Sci Tran Med with positive results with autologous transplantation of P63 progenitor cells in patients with COPD. The current study suggests that P63+ progenitor cell therapy is safe in patients with ILD. The authors attribute this to the acquisition of cells from a healthy upper lobe site, removed from the lung fibrosis. There are currently no cell-based therapies for ILD and in this regard the study is novel with important potential for clinical impact if validated in Phase 2 and 3 clinical trials.

      Strengths:

      This study addresses the need for an effective therapy for interstitial lung disease. It offers good evidence that the cells used for therapy are safe. In so doing it addresses a concern that some P63+ progenitor cells may be proinflammatory and harmful, as has been raised in the literature (articles which suggested some P63+ cells can promote honeycombing fibrosis; references 26 &35). The authors attribute the safety they observed (without proof) to the high HOPX expression of administered cells (a marker found in normal Type 1 AECs. The totality of the RNASeq suggests the cloned cells are not fibrogenic. They also offer exploratory data suggesting a relationship between clone roundness and PFT parameters (and a negative association between patient age and clone roundness).

      We thank the reviewer for the important comments.

      Weaknesses:

      The authors can conclude they can isolate, clone, expand, and administer P63+ progenitor cells safely; but with the small sample size and lack of a placebo group, no efficacy should be implied.

      We thank the reviewer for the suggestion and agree that we should be more cautious to discuss the efficacy of current study.

      Specific points:

      (1) The authors acknowledge most study weaknesses including the lack of a placebo group and the concurrent COVID-19 in half the subjects (the high-dose subjects). They indicate a phase 2 trial is underway to address these issues.

      N/A

      (2) The authors suggest an efficacy signal on pages 18 (improvement in 2 subjects' CT scans) and 21 (improvement in DLCO) but with such a small phase 1 study and such small increases in DLCO (+5.4%) the authors should refrain from this temptation (understandable as it is).

      We believe that exploring potential efficacy signal is also one important aim of this study in addition to safety evaluation. All these efficacy endpoint analyses had been planned in prior to the start of clinical trials (as registered in ClinicalTrial.gov) and the results anyhow need be analyzed and reported in the manuscript. And we will cautiously discuss the significance of the efficacy signal and avoid over-interpretation.

      (3) Likewise most CT scans were unchanged and those that improved were in the mid-dose group (albeit DLCO improved in the 2 patients whose CT scans improved).

      Yes, it is.

      (4) The authors note an impressive 58m increase in 6MWTD in the high-dose group but again there is no placebo group, and the low-dose group has no net change in 6MWTD at 24 weeks.

      Yes.

      (5) I also raise the question of the enrollment criteria in which 5 patients had essentially normal DLCO/VA values. In addition there is no discussion as to whether the transplanted stem cells are retained or exert benefit by a paracrine mechanism (which is the norm for cell-based therapies).

      Thank you for your detailed feedback.  The enrollment criteria are based on DLCO instead of DLCO/VA. And we would like to further discuss the possible benefit by paracrine mechanism in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes a first-in-human clinical trial of autologous stem cells to address IPF. The significance of this study is underscored by the limited efficacy of standard-of-care anti-fibrotic therapies and increasing knowledge of the role p63+ stem cells in lung regeneration in ARDS. While models of acute lung injury and p63+ stem cells have benefited from widespread and dynamic DAD and immune cell remodeling of damaged tissue, a key question in chronic lung disease is whether such cells could contribute to the remodeling of lung tissue that may be devoid of acute and dynamic injury. A second question is whether normal regions of the lung in an otherwise diseased organ can be identified as a source of "normal" p63+ stem cells, and how to assess these stem cells given recently identified p63+ stem cell variants emerging in chronic lung diseases including IPF. Lastly, questions of feasibility, safety, and efficacy need to be explored to set the foundation for autologous transplants to meet the huge need in chronic lung disease. The authors have addressed each of these questions to different extents in this initial study, which has yielded important if incomplete information for many of them.

      Strengths:

      As with a previous study from this group regarding autologous stem cell transplants for COPD (Ref. 24), they have shown that the stem cells they propagate do not form colonies in soft agar or cancers in these patients. While a full assessment of adverse events was confounded by a wave of Covid19 infections in the study participants, aside from brief fevers it appears these transplants are tolerated by these patients.

      We thank the reviewer for the important comments.

      Weaknesses:

      The source of stem cells for these autologous transplants is generally bronchoscopic biopsies/brushings from 5th-generation bronchi. Although stem cells have been cloned and characterized from nasal, tracheal, and distal airway biopsies, the systematic cloning and analysis of p63+ stem cells across the bronchial generations is less clear. For instance, p63+ stem cells from the nasal and tracheal mucosa appear committed to upper airway epithelia marked by 90% ciliated cells and 10% goblet cells (Kumar et al., 2011. Ref. 14). In contrast, p63+ stem cells from distal lung differentiate to epithelia replete with Club, AT2, and AT1 markers. The spectrum of p63+ stem cells in the normal bronchi of any generation is less studied. In the present study, cells are obtained by bronchoscopy from 3-5 generation bronchi and expanded by in vitro propagation. Single-cell RNAseq identifies three clusters they refer to as C1, C2, and C3, with the major C1 cluster said to have characteristics of airway basal cells and C2 possibly the same cells in states of proliferation. Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates? This could be readily determined by 3-D differentiation in so-called air-liquid interface cultures pioneered by cystic fibrosis investigators and should be done as it would directly address the validity of the sourcing protocol for autologous cells for these transplants. This would more clearly link the present study with a previous study from the same investigators (Shi et al., 2019, Ref. 9) whereby distal airway stem cells mitigated fibrosis in the murine bleomycin model. The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation.

      We totally agree that the sub-population of the progenitor cells should be further analyzed. We would try this in the revised manuscript. And the methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      The authors should also make a more concerted effort to compare Clusters 1, 2, and 3 with the variant stem cell identified in IPF (Wang et al., 2023, Ref. 27). While some of the markers are consistent with this variant stem cell population, others are not. A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants. 

      We thank for reviewer for the good suggestion and would like to make more detailed comparison in the revised manuscript.

      Other than these issues the authors should be commended for these first-in-human trials for this important condition.

      Thank you so much for the kind compliment.

    1. Author response:

      Public Review:

      In this work, the authors develop a new computational tool, DeepTX, for studying transcriptional bursting through the analysis of single-cell RNA sequencing (scRNA-seq) data using deep learning techniques. This tool aims to describe and predict the transcriptional bursting mechanism, including key model parameters and the steady-state distribution associated with the predicted parameters. By leveraging scRNA-seq data, DeepTX provides high-resolution transcriptional information at the single-cell level, despite the presence of noise that can cause gene expression variation. The authors apply DeepTX to DNA damage experiments, revealing distinct cellular responses based on transcriptional burst kinetics. Specifically, IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU affects burst frequency in human cancer cells, leading to apoptosis or, depending on the dose, to survival and potential drug resistance. These findings underscore the fundamental role of transcriptional burst regulation in cellular responses to DNA damage, including cell differentiation, apoptosis, and survival. Although the insights provided by this tool are mostly well supported by the authors' methods, certain aspects would benefit from further clarification.

      The strengths of this paper lie in its methodological advancements and potential broad applicability. By employing the DeepTXSolver neural network, the authors efficiently approximate stationary distributions of mRNA count through a mixture of negative binomial distributions, establishing a simple yet accurate mapping between the kinetic parameters of the mechanistic model and the resulting steady-state distributions. This innovative use of neural networks allows for efficient inference of kinetic parameters with DeepTXInferrer, reducing computational costs significantly for complex, multi-gene models. The approach advances parameter estimation for high-dimensional datasets, leveraging the power of deep learning to overcome the computational expense typically associated with stochastic mechanistic models. Beyond its current application to DNA damage responses, the tool can be adapted to explore transcriptional changes due to various biological factors, making it valuable to the systems biology, bioinformatics, and mechanistic modelling communities. Additionally, this work contributes to the integration of mechanistic modelling and -omics data, a vital area in achieving deeper insights into biological systems at the cellular and molecular levels.

      We thank the reviewers for their positive opinion on our manuscript. As reflected in our detailed responses to the reviewers’ comments, we will make significant changes to address their concerns comprehensively.

      This work also presents some weaknesses, particularly concerning specific technical aspects. The tool was validated using synthetic data, and while it can predict parameters and steady-state distributions that explain gene expression behaviour across many genes, it requires substantial data for training. The authors account for measurement noise in the parameter inference process, which is commendable, yet they do not specify the exact number of samples required to achieve reliable predictions. Moreover, the tool has limitations arising from assumptions made in its design, such as assuming that gene expression counts for the same cell type follow a consistent distribution. This assumption may not hold in cases where RNA measurement timing introduces variability in expression profiles.

      Thank you for your detailed and constructive feedback on our work. We will address the key concerns raised from the following points:

      (1) Clarification on the required sample size: We tested the robustness of our inference method on simulated datasets by varying the number of single-cell samples. Our results indicated that the predictions of burst kinetics parameters become accurate when the number of cells reaches 500 (Supplementary Figure S3d, e). This sample size is smaller than the data typically obtained with current single-cell RNA sequencing (scRNA-seq) technologies, such as 10x Genomics and Smart-seq3 (Zheng GX et al., 2017; Hagemann-Jensen M et al., 2020). Therefore, we believed that our algorithm is well-suited for inferring burst kinetics from existing scRNA-seq datasets, where the sample size is sufficient for reliable predictions. We will clarify this point in the main text to make it easier for readers to use the tool.

      (2) Assumption-related limitations: One of the fundamental assumptions in our study is that the expression counts of each gene are independently and identically distributed (i.i.d.) among cells, which is a commonly adopted assumption in many related works (Larsson AJM et al., 2019; Ochiai H et al., 2020; Luo S et al., 2023). However, we acknowledged the limitations of this assumption. The expression counts of the same gene in each cell may follow distinct distributions even from the same cell type, and dependencies between genes could exist in realistic biological processes. We recognized this and will deeply discuss these limitations from assumptions and prospect as an important direction for future research.

      The authors present a deep learning pipeline to predict the steady-state distribution, model parameters, and statistical measures solely from scRNA-seq data. Results across three datasets appear robust, indicating that the tool successfully identifies genes associated with expression variability and generates consistent distributions based on its parameters. However, it remains unclear whether these results are sufficient to fully characterize the transcriptional bursting parameter space. The parameters identified by the tool pertain only to the steady-state distribution of the observed data, without ensuring that this distribution specifically originates from transcriptional bursting dynamics.

      We appreciate your insightful comments and the opportunity to clarify our study’s contributions and limitations. Although we agree that assessing whether the results from these three realistic datasets can represent the characterize transcriptional burst parameter space is challenging, as it depends on data property and conditions in biology, we firmly believe that DeepTX has the capacity to characterize the full parameter space. This believes stems from the extensive parameters and samples we input during model training and inference across a sufficiently large parameter range (Method 1.3). Furthermore, the training of the model is both flexible and scalable, allowing for the expansion of the transcriptional burst parameter space as needed. We will clarify this in the text to enable readers to use DeepTX more flexibly.

      On the other hand, we agree that parameter identification is based on the steady-state distribution of the observed data (static data), which loses information about the fine dynamic process of the burst kinetics. In principle, tracking the gene expression of living cells can provide the most complete information about real-time transcriptional dynamics across various timescales (Rodriguez J et al., 2019). However, it is typically limited to only a small number of genes and cells, which could not investigate general principles of transcriptional burst kinetics on a genome-wide scale. Therefore, leveraging the both steady-state distribution of scRNA-seq data and mathematical dynamic modelling to infer genome-wide transcriptional bursting dynamics represents a critical and emerging frontier in this field. For example, the statistical inference framework based on the Markovian telegraph model, as demonstrated in (Larsson AJM et al., 2019), offers a valuable paradigm for understanding underlying transcriptional bursting mechanisms. Building on this, our study considered a more generalized non-Mordovian model that better captures transcriptional kinetics by employing deep learning method under conditions such as DNA damage. This provided a powerful framework for comparative analyses of how DNA damage induces alterations in transcriptional bursting kinetics across the genome. We will highlight the limitations of current inference using steady-state distributions in the text and look ahead to future research directions for inference using time series data across the genome.

      A primary concern with the TXmodel is its reliance on four independent parameters to describe gene state-switching dynamics. Although this general model can capture specific cases, such as the refractory and telegraph models, accurately estimating the parameters of the refractory model using only steady-state distributions and typical cell counts proves challenging in the absence of time-dependent data.

      We thank you for highlighting this critical concern regarding the TXmodel's reliance on four independent parameters to describe gene state-switching dynamics. We acknowledge that estimating the parameters of the TXmodel using only steady-state distributions and typical single-cell RNA sequencing (scRNA-seq) data poses significant challenges, particularly in the absence of time-resolved measurements.

      As described in the response of last point, while time-resolved data can provide richer information than static scRNA-seq data, it is currently limited to a small number of genes and cells, whereas static scRNA-seq data typically capture genome-wide expression. Our framework leverages deep learning methods to link mechanistic models with static scRNA-seq data, enabling the inference of genome-wide dynamic behaviors of genes. This provides a potential pathway for comparative analyses of transcriptional bursting kinetics across the entire genome.

      Nonetheless, the refractory model and telegraphic model are important models for studying transcription bursts. We will discuss and compare them in terms of the accuracy of inferred parameters. Certainly, we agree that inferring the molecular mechanisms underlying transcriptional burst kinetics using time-resolved data remains a critical future direction. We will include a brief discussion on the role and importance of time-resolved data in addressing these challenges in the discussion section of the revised manuscript.

      The claim that the GO analysis pertains specifically to DNA damage response signal transduction and cell cycle G2/M phase transition is not fully accurate. In reality, the GO analysis yielded stronger p-values for pathways related to the mitotic cell cycle checkpoint signalling. As presented, the GO analysis serves more as a preliminary starting point for further bioinformatics investigation that could substantiate these conclusions. Additionally, while GSEA analysis was performed following the GO analysis, the involvement of the cardiac muscle cell differentiation pathway remains unclear, as it was not among the GO terms identified in the initial GO analysis.

      We thank the reviewer for this valuable feedback and for pointing out the need for clarification regarding the GO and GSEA analyses. We agree that the connection between the cardiac muscle cell differentiation pathway identified in the GSEA analysis and the GO terms from the initial analysis requires further clarification. This discrepancy arises because GSEA examines broader sets of pathways and may capture biological processes not highlighted by GO analysis due to differences in the statistical methods and pathway definitions used. We will revise the manuscript to address this point, explicitly discussing the distinct yet complementary nature of GO and GSEA analyses and providing a clearer interpretation of the results.

      As the advancement is primarily methodological, it lacks a comprehensive comparison with traditional methods that serve similar functions. Consequently, the overall evaluation of the method, including aspects such as inference accuracy, computational efficiency, and memory cost, remains unclear. The paper would benefit from being contextualised alongside other computational tools aimed at integrating mechanistic modelling with single-cell RNA sequencing data. Additional context regarding the advantages of deep learning methods, the challenges of analysing large, high-dimensional datasets, and the complexities of parameter estimation for intricate models would strengthen the work.

      We greatly appreciate your insightful feedback, which highlights important considerations for evaluating and contextualizing our methodological advancements. Below, we emphasize our advantages from both the modeling perspective and the inference perspective compared with previous model. As our work is rooted in a model-based approach to describe the transcriptional bursting process underlying gene expression, the classic telegraph model (Markovian) and non-Markovian models which are commonly employed are suitable for this purpose:

      Classic telegraph model: The classic telegraph model allows for the derivation of approximate analytical solutions through numerical integration, enabling efficient parameter point estimation via maximum likelihood methods, e.g., as explored in (Larsson AJM et al., 2019). Although exact analytical solutions for the telegraph model are not available, certain moments of its distribution can be explicitly derived. This allows for an alternative approach to parameter inference using moment-based estimation methods, e.g., as explored in (Ochiai H et al., 2020). However, it is important to note that higher-order sample moments can be unstable, potentially leading to significant estimation bias.

      Non-Markovian Models: For non-Markovian models, analytical or approximate analytical solutions remain elusive. Previous work has employed pseudo-likelihood approaches, leveraging statistical properties of the model’s solutions to estimate parameters, e.g., as explored in (Luo S et al., 2023). However, the method may suffer from low inference efficiency.

      In our current work, we leverage deep learning to estimate parameters of TXmodel, which is non-Markovian model. First, we represent the model's solution as a mixture of negative binomial distributions, which is obtained by the deep learning method. Second, through integration with the deep learning architecture, the model parameters can be optimized using automatic differentiation, significantly improving inference efficiency. Furthermore, by employing a Bayesian framework, our method provides posterior distributions for the estimated dynamic parameters, offering a comprehensive characterization of uncertainty. Compared to traditional methods such as moment-based estimation or pseudo-likelihood approaches, we believe our approach not only achieves higher inference efficiency but also delivers posterior distributions for kinetics parameters, enhancing the interpretability and robustness of the results. We will present and emphasize the computational efficiency and memory cost of our methods the revised version.

      Reference

      Zheng, G.X., Terry, J.M., Belgrader, P., Ryvkin, P., Bent, Z.W., Wilson, R., Ziraldo, S.B., Wheeler, T.D., McDermott, G.P., Zhu, J., Gregory, M.T., Shuga, J., Montesclaros, L., Underwood, J.G., Masquelier, D.A., Nishimura, S.Y., Schnall-Levin, M., Wyatt, P.W., Hindson, C.M., Bharadwaj, R., Wong, A., Ness, K.D., Beppu, L.W., Deeg, H.J., McFarland, C., Loeb, K.R., Valente, W.J., Ericson, N.G., Stevens, E.A., Radich, J.P., Mikkelsen, T.S., Hindson, B.J., Bielas, J.H. 2017. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8: 14049. DOI: https://dx.doi.org/10.1038/ncomms14049, PMID: 28091601

      Hagemann-Jensen, M., Ziegenhain, C., Chen, P., Ramsköld, D., Hendriks, G.J., Larsson, A.J.M., Faridani, O.R., Sandberg, R. 2020. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38: 708-714. DOI: https://dx.doi.org/10.1038/s41587-020-0497-0, PMID: 32518404

      Larsson, A.J.M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O.R., Reinius, B., Segerstolpe, A., Rivera, C.M., Ren, B., Sandberg, R. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565: 251-254. DOI: https://dx.doi.org/10.1038/s41586-018-0836-1, PMID: 30602787

      Ochiai, H., Hayashi, T., Umeda, M., Yoshimura, M., Harada, A., Shimizu, Y., Nakano, K., Saitoh, N., Liu, Z., Yamamoto, T., Okamura, T., Ohkawa, Y., Kimura, H., Nikaido, I. 2020. Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells. Science Adavances 6: eaaz6699. DOI: https://dx.doi.org/10.1126/sciadv.aaz6699, PMID: 32596448

      Luo, S., Wang, Z., Zhang, Z., Zhou, T., Zhang, J. 2023. Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kinetics. Nucleic Acids Research 51: 68-83. DOI: https://dx.doi.org/10.1093/nar/gkac1204, PMID: 36583343

      Rodriguez, J., Ren, G., Day, C.R., Zhao, K., Chow, C.C., Larson, D.R. 2019. Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity. Cell 176: 213-226.e218. DOI: https://dx.doi.org/10.1016/j.cell.2018.11.026, PMID: 30554876

      Luo, S., Zhang, Z., Wang, Z., Yang, X., Chen, X., Zhou, T., Zhang, J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Royal Society Open Science 10: 221057. DOI: https://dx.doi.org/10.1098/rsos.221057, PMID: 37035293

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Gap of knowledge:

      From the introduction, I got the impression that the manuscript tries to answer the question of whether homeostatic structural plasticity is functionally redundant to synaptic scaling. However, the importance of this question needs to be worked out better. Also, I think it is hard to tackle this question with the shown experiments as one would have to block all other redundant mechanisms and see whether HSP functionally replaces them.

      We appreciate the reviewer’s valuable feedback regarding the relationship between homeostatic structural plasticity (HSP) and synaptic scaling. The main objective of our study is indeed to investigate whether structural plasticity is homeostatically regulated, and if so, whether it acts as a redundant or heterogeneous mechanism in relation to synaptic scaling, which is widely recognized as a primary homeostatic process.

      In our revised introduction, we have clarified this central question and its significance. Specifically, we explored why experimentally observed changes in spine density, a measure of structural plasticity, do not exhibit the same homeostatic characteristics as changes in spine head size, which reflects synaptic scaling, particularly under conditions of activity blockade.

      We hypothesized two key points:

      (1) Structural plasticity may not follow a monotonically activity-dependent rule as strictly as synaptic scaling.

      (2) The observed changes in spine density may be influenced by the simultaneous modulation of spine size, suggesting that structural plasticity and synaptic scaling interact within the same biological system.

      Both hypotheses were tested through a combination of experimental observations and systematic computer simulations. Our conclusions demonstrate that spine-number-based structural plasticity follows a biphasic activity-dependent rule. While it largely overlaps with synaptic scaling under typical conditions, it exhibits heterogeneity under extreme conditions, such as activity silencing. Furthermore, our simulations revealed that both mechanisms can compete and complement each other within neural networks.

      We believe that these results offer a nuanced understanding of the interaction between structural plasticity and synaptic scaling, highlighting their redundancy under most conditions but also their heterogeneity under specific circumstances. Blocking all other redundant mechanisms, as suggested, would provide a more reductionist view, which may not capture the complexity and interplay of these processes in a physiological setting. Our approach reflects this complexity, providing insight into how these mechanisms operate together in a naturalistic context.

      We have revised the introduction to better convey these points and emphasize the significance of this question for understanding the dynamics of homeostatic regulation in neural networks.

      Similarly, the simulations do not really tackle redundancy as, e.g. network growth cannot be achieved by scaling alone.

      We appreciate the reviewer’s comment regarding synaptic scaling's limitations in achieving network growth. We would like to clarify that we did not intend to suggest that structural plasticity and synaptic scaling are fully redundant. In fact, it is well established in the literature that structural plasticity plays a dominant role during development, particularly in network growth, which synaptic scaling alone cannot achieve.

      The primary objective of our study was to investigate the interaction between structural plasticity and synaptic scaling under conditions of activity perturbation, rather than during network growth or development. To avoid any confusion regarding developmental processes, we chose to grow the network using only structural plasticity in our simulations. Synaptic scaling was then introduced (or not) during the phase of activity deprivation to specifically examine its role in regulating homeostasis under these conditions.

      We have revised the corresponding sections of the manuscript to clarify this distinction, and we have ensured that the simulations reflect our focus on activity perturbation rather than network development. This distinction should help readers avoid conflating developmental processes with the specific goals of our study.

      Instead, the section on "Integral feedback mechanisms" (L112-129) contains a much better description of the actual goals of the paper than is given in the introduction. Moreover, this section does not seem to include any new results (at least the Ca-dependent structural plasticity and synaptic scaling rules seem to be very common for me). I, therefore, suggest fusing this paragraph in the introduction to obtain a clearer and better understandable gap of knowledge, which is addressed by the paper.

      We agree that the "Integral feedback control" section provides key information relevant to both the Introduction and Methodology. It outlines the theoretical framework and serves as a basis for the experimental design.

      To better reflect this, we have revised the Introduction to include the gap in knowledge. However, we opted to retain the section in the Results, slightly modified, to set the context for the first experiment.

      Along this line, as it seems a central point of the manuscript to distinguish the controller dependencies on Calcium, the different dependencies (working models) should be described in more detail. Also, the description of the inconsistencies of the previous results on HSP can be moved from the discussion (l419-l441) to the introduction.

      We have revised the manuscript to place less emphasis on the controller models while retaining the core principles of control theory. The description of the HSP model has been moved to the Introduction, as suggested, while the detailed history remains in the Discussion to maintain the manuscript's consistency.

      Systematic text revision: Regarding comment (1), we thank the reviewer for suggesting the text reorganization. We have adjusted several parts in the introduction, M&M section, and results section to increase clarity.

      (2) Pharmacological Choice:

      It should be discussed why NBQX is used to induce the homeostatic effect instead of TTX. As there are studies showing that it might block homeostatic rewiring (doi.org/10.1073/pnas.0501881102) as well as synaptic scaling (10.1523/JNEUROSCI.3753-08.2009), it seems unclear whether the observed effects are actually corresponding to those in other publications.

      The rationale for using NBQX in our experiments, rather than TTX, is detailed in the public response. We selected NBQX based on specific experimental motivations relevant to our study’s objectives, while acknowledging the potential differences in effects compared to other studies.

      Local text revision: We added one paragraph in the discussion section to explain the idea better.

      (3) Model-Experiment Connection:

      The paper combines simulations with experimental work, which is very good. However, in my opinion, the only connection between the two parts is that the experiments suggest a non-monotonic dependency between firing rate and synapse density (i.e. the biphasic dependency). The rest of the experimental results seem to be neglected in the modeling part. It is not even shown that the model reproduces the experiments. Instead, the model is tested in different situations and paradigms (blocking AMPARs in the whole culture vs network growth or silencing a sub-population). I think it would make the paper stronger and more consequential when a reproduction of the experiment by the model is demonstrated (with analogue analyses).

      The experimental results serve three main purposes. First, as the reviewer noted, the spine analysis was conducted to inform the biphasic rule. Second, spine size analysis was performed to replicate published findings and confirm our modeling results, showing that activity deprivation leads to fewer synapses with larger sizes or higher weights. Third, the correlation analysis of spine density and size across dendritic segments suggested a hybrid combination of two types of plasticity across different neurons.

      While we addressed these aspects in the Results and Discussion sections, the collective presentation in Fig. 2 may have caused some confusion. To improve clarity, we have now split the experimental results, presenting them alongside the relevant modeling data in Fig. 2, Fig. 8, and Fig. 9.

      Also, there are a few more mismatches between the experiment and the model that you will want to discuss:

      • The size-dependent homeostatic effect (l154ff, Fig2F) is not reflected by the used scaling model.

      We revised Fig 8 and the corresponding text to explain how the scaling model reflects such an effect.

      • The model assumes reduced Ca levels. Yet, the experimental protocol blocks AMPARs, which are to my knowledge not the primary source of Ca influx, but rather the NMDARs.

      The model is based on neural activity, with calcium concentration serving as an internal integral signal of the firing rate, allowing for integral control. While calcium plays a critical role in homeostasis, we caution against drawing a strict correspondence between the model's calcium dynamics and the experimental protocol, as calcium can be sourced from multiple pathways in neurons beyond AMPARs, such as NMDARs, voltage gated calcium channels, and intracellular stores. Also, our recent work demonstrated that under baseline conditions, the majority of AMPARs are not Ca2+ permeable, i.e., GluA2-lacking (Kleidonas et al., 2023)

      Improving the calcium dynamics, including secondary calcium release and calcium stores, is part of our future plan to refine the HSP model and address experimental findings that are not fully explained by the current model.

      • The model further assumes silencing by input removal, whereas the recurrent connections stay intact. Wouldn't this rather correspond to a deafferentation experiment, where connections to another brain area are cut?

      Thank you for pointing at this. The modeling section was not intended to directly replicate the tissue culture experiments but rather to provide insights into a broader range of scenarios, including pharmacological treatments, deafferentation, lesions, and even monocular deprivation.

      Systematic text revision: Regarding comment (3), the goal of our modeling work was more than reproducing. To better serve the purposes of experimental results used in the present study, to inform, confirm, and inspire, we have systematically adjusted the layout of experimental and modeling results to link them better.

      (4) Is the recurrent component too weak?

      Your results show that HSP does not restore activity after silencing (deafferentation), whereas you discuss that earlier models did achieve this by active neighbors in a spatially organized network. However, the silenced neurons in your simulations also receive inputs through the "recurrent" connections from their neighbors (at least shortly after silencing). Therefore, given the recurrent input is strong enough, they should be able to recover in a similar way as the spatially organized ones. As a consequence, I obtained the impression that, in your model networks, activity is strongly driven by external stimulation and less by recurrent connections. I understand that this is important to achieve silencing through removing the Poisson stimulation. Yet, this fact may be responsible for the failure to restore activity such that presented effects are only applicable for networks that are strongly driven by external inputs, but not for strongly recurrent networks, which would severely limit the generality of the results. As a consequence, the paper would benefit from a systematic analysis of the trade-off between recurrent strength and input strength. Maybe, different constant negative currents could be injected in all neurons, such that HSP creates more recurrent synapses in the network.

      We appreciate this insight. However, increasing recurrent input strength is beyond the scope of the current study, as it would fundamentally alter the predefined network dynamics of the Brunel network used. As noted in the manuscript, complete isolation or cell death is not always the outcome after input deprivation, lesion, or stroke, which cannot be fully explained by the Gaussian HSP rule alone. Butz and colleagues offered a solution using growth rules that maximized recurrent input, and we recognize the importance of their work.

      That said, we approached the issue from a different angle, emphasizing the role of synaptic scaling in recurrence rather than relying solely on recurrent input strength. In biological networks, external inputs may vary, recurrency can be weak or strong, and synaptic scaling can dominate. Our model offers a complementary hypothesis, suggesting that these factors, in combination, contribute to the diverse and sometimes contradictory results found in the literature, rather than posing a strict constraint on network topology.

      Local text revision: We emphasized these points in the Discussion section again.

      (5) Missing conclusions / experimental predictions

      As already described, the modelling work is not reproducing the presented or previous experimental data. Hence, the goal of modelling should be to derive a more general understanding and make experimental predictions. Yet, the conclusions in the discussion stay superficial and vague and there are no specific experimental predictions derived from the model results.

      For example, the authors report that the recovery of activity in silenced cultures is observed in a previously spatially structured model but not in theirs -- at least with slow or no scaling. Yet it is left to the reader to think about whether the current model is an improvement to the previous one, how they could be experimentally distinguished, or to which experimental findings they relate or compare, which I would expect at this point. I would advise reworking the discussion and thoroughly working out which new insights the modelling part of the study has generated (not to be confused with the assumptions of the model aka the biphasic plasticity rule) and relating them to experimental pre- and postdiction.

      We recognize the reviewer’s concern, which is closely related to comment (4). We have addressed these points by reorganizing the text to better clarify the purpose of our experimental work and its connection to the modeling results.

      Specifically, we have reworked the discussion to highlight the new insights gained from the modeling, and how these can inform experimental predictions and interpretations. This includes distinguishing our model from previous ones and providing clearer connections to experimental findings.

      Systematic text revision: Most of the comments on combining experiments and modeling results and on developing the story based on our expectations raised here are sincere and may also reflect the expectations and concerns of a broader readership, so we have accordingly adjusted the text in the Results and Discussion sections to make our points clear.

      Suggestions for minor changes:

      Fig 1I: Please check the graph and make it more self-explaining. For example, mark the "setpoint" activity (in my opinion, both curves should be at baseline there. In that case, however, I do not see the biphasic behavior anymore). Maybe the table and the graph can be aligned along the activity axis? Also: synaptic inhibition should be increased and not decreased, right?

      Local text and figure revision: I guess the reviewer meant for Fig. 2I? We have improved the visualization to avoid confusion.

      L74-81: I would reverse the order of associative and homeostatic plasticity in this paragraph.

      Local text and figure revision: We have fine-tuned the order in the first and second paragraphs to match the readers' expectations.

      L74-75: Provide references for such theories.

      Local text and figure revision: fixed.

      L84-86: Please provide a reference for the claim that negative feedback, redundancy, and heterogeneity contribute to robustness.

      Local text and figure revision: fixed.

      L 95-97: I think the heterogeneity aspect needs to be worked out a bit better. Do you mean that the described mechanisms contribute to firing rate homeostasis in a different mixture for each neuron (as shown assumed in the last figure)?

      Local text and figure revision: The term heterogeneity is used in the manuscript for two major different settings: (1) heterogeneity in terms of control theory and (2) different combinations of HSP and SS rules. We have named the second condition as diversity to avoid confusion.

      L 132: The question of linearity has not been posed so far. Also, I think "monotonous" would be a much better term than linear (as a test for linearity would require more than 2 datapoints).

      Local text and figure revision: We agreed linear is not a good term. We replaced it with ‘monotonic’ throughout the manuscript.

      Fig2 Bii: The data for 50um is clearly not Gaussian.

      We did not imply that the 50 µM condition is Gaussian. Instead, we noted that the non-linearity observed in both the 200 nM and 50 µM data suggests a non-monotonic growth rule rather than a linear one. We applied the Gaussian rule because it has been extensively studied in previous simulations, allowing us to benchmark our findings against those results.

      Fig2 D, E inset: The point at time 0 does not convey any information and could be left out.

      The time zero data is included to demonstrate that the three groups have a similar baseline, ensuring that any observed differences are due to the treatment and not pre-existing biases in the grouping.

      L 178: As the Gaussian rule drops below zero above the upper set-point again, it is rather tri-phasic than bi-phasic.

      We intended to convey that inhibition results in either spine growth or deletion, reflecting a bi-phasic response rather than a true tri-phasic one.

      Fig 6A: You may want to mark the eta variables in the curves.

      Local text and figure revision: fixed.

      Fig 6E: The curve of the S population extending to the next panel looks a bit messy.

      We retained the curve extension to visually convey the impression of excessive network activity.

      L272: It needs to be better described/motivated how protocol 1 and 2 are supposed to study the role of recurrent connection as well as what kind of biological situation this may be.

      Local text and figure revision: The corresponding text has been adjusted to avoid confusion.

      L 272: It is not clear how faster simulation leads to less recurrent connectivity, when the stimulation protocol and the rates stay the same and the algorithm compensates for the timestep properly. Maybe you rather want to say that you silence 10x longer and stimulate 10x longer?

      Local text revision: The corresponding text has been adjusted to avoid confusion.

      L. 302: "reactivate"?

      Local text revision: fixed.

      L 322f: I would suggest showing the connectivity matrix for a time-point with restored activity as well.

      Local text and figure revision: fixed.

      Fig 8A: The use of the morphological reconstructions is a bit misleading as the model uses point neuron.

      Local text revision: Now after reorganization, it is in Fig.9. We kept the reconstruction figure for motivational purposes, suggesting how to understand the meaning of the combinations in more biologically realistic scenarios. The corresponding text has been adjusted to avoid confusion.

      Fig 8E-F: the y axis should be in the same orientation as in panel D.

      Local text and figure revision: Good idea and fixed in the new Fig. 9.

      Fig. 8F: The results here look a little bit random. Maybe more runs with the same parameters would smooth out the contours or reveal a phase transition.

      Local text and figure revision: Thank you for the suggestion. We conducted an additional ten random trials to average the traces and heatmaps, improving the clarity of the results now presented in Fig. 9.

      L411: Note that there are earlier HSP models by Damasch and van Ooyen & van Pelt, that might be worth discussing here.

      Local text revision: fixed.

      L416 "beyond synaptic scaling" reference needed.

      Local text revision: fixed.

      L419: The biphasic rule was suggested by Butz already.

      Local text revision: We adjusted the text to emphasize our contribution in suggesting/confirming the biphasic rule based on direct experimental observations.

      L 419-44: Most of this is actually state-of-the art and may be better placed in the introduction to justify the use of NBQX as a competititve blocker.

      Local text revision: We adjusted the text in the introduction and Discussion sections to cover the raised points.

      L487: In my opinion, although scaling adapts the weights quickly, the information about deviating firing rate is still stored in the calcium signal such that it will also give rise to structural changes (although they may be small when the rate is low). Thus, I think that fast scaling does not abolish structural changes.

      Local text revision: We adjusted the text to account for other factors that could lead to the same or opposite conclusions.

      L502f: Sentence unclear. Do you mean Ca is an integrated (low-pass filtered) version of the firing rate?

      Yes.

      L504: What is the cumulative temporal effect of error in estimating firing rates?

      We were referring to the potential instability in numeric simulations if the firing rate is not tracked by an integral signal (calcium concentration) but is instead estimated through average spike counts over time. In our model, calcium serves as a proxy for the firing rate to guide homeostatic structural plasticity. The intake and decay constants are set to minimize the accumulation of errors over time, making long-term error accumulation unlikely. In any case, this is not intended to be a precise measure of the firing rate but rather a smooth guide for homeostatic control.

      Local text revision: We rewrote the section so as not to cause extra concerns.

      L505: Which two rules are meant here? Ca- and firing rate based or HSP and scaling?

      Local text revision: The two rules are the HSP rule and the HSS rule. We have adjusted the text to improve clarity.

      L505ff: I did not really understand the control theoretic view here and Supp Fig 5 is not self-explaining enough to help. In my view, scaling is a proportional controller for the calcium level (the setpoint is defined for calcium and not firing rate). Also, all of the HSP rules do neither contain an integral nor a differential of the error and are thus nonlinear but proportional controllers in first approximation. If this part is supposed to stay in the manuscript, the supporting information should contain a more detailed mathematical explanation. Relevant previous work on homeostatic control by synaptic scaling and homeostatic rewiring, e.g. doi: 10.23919/ECC54610.2021.9655157 should be discussed

      Local text revision: We have updated the last paragraph to increase clarity. The HSP and HSS rules are proportional and integral for neural activity, as neural firing rate homeostasis is the meaningful goal. However, it is also correct that the integral component is gone if we view calcium concentration as the goal or setpoint. This paper is discussed and cited in a paragraph above this one.

      Reviewer #2 (Recommendations For The Authors):

      I have some additional suggestions and questions for the authors, which I am presenting following the order of the figures.

      Fig 1A: I'm a little bit puzzled by the timescales between Hebbian and homeostatic plasticity; a wealth of data suggests that Hebbian plasticity acts on a faster timescale than homeostatic plasticity, while Aii-Aiii implies the opposite. In lesion-induced degeneration, for instance, which is mentioned later by the authors, spine loss has been suggested to be Hebbian (LTD) while the subsequent recovery is homeostatic. Additionally, it will not be clear to the reader if the same stimulus could induce Hebbian and homeostatic plasticity, or why; the rest of the manuscript seems to imply that any stimulus could and would trigger homeostatic plasticity, which is not the case. Finally, there should be a mention somewhere that Hebbian structural plasticity also exists.

      Local text and figure revision: We thank the reviewer for pointing out the time scale issue, which was not explicitly considered here and is now updated.

      Fig. 2Bii: There is no significant difference at 200nm NBQX for sEPSC amplitude, contrary to what is stated in the text (line 136). Which one is it?

      Local text revision: We thank the reviewer for pointing out the mistake. We have inspected the original statistical file and corrected the text.

      Fig. 2F: The description of Fig. 2F in the text confused me for the longest time. I am still unsure why 200nm NBQX is described as leading to a general size increase when it follows the control line so closely, crosses 0 at the same point, and is even below the control line for the largest spine sizes. Similarly, 50um NBQX neatly overlaps with the control condition except for the smallest and largest spines, so the "shrinkage of middle-sized spines" doesn't seem different from the control condition. I also couldn't find any data supporting the statement that 50um NBQX increased only the size of "a small subset of large spines". Maybe the authors could clarify this section? I would also suggest adding statistics between the treatments at each spine size bin to support the claims, as they are central to the rest of the paper.

      Importantly, there is no description of the normalization nor the quantification of the difference between days in the methods; I am assuming post-pre for the difference and (post-pre)/pre for the normalization, but this should be much more detailed in the methodology. I was happy to see the baseline raw spine sizes in Supplementary Fig. 1, and would also suggest adding the raw spine sizes after treatment for comparison.

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 2G/S2A: a scale for the label sizes would be helpful. I would also like to have the same correlation for 50um NBQX treatment and the control condition (at least in the supplementary figures).

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 2I: I might be missing something, but why is the activity line flat when there are changes in spine density and size?

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 3C-D: they are referenced in the text as Fig. 1C-D (lines 188-194).

      Local text revision: fixed.

      Fig. 5: it is interesting that the biphasic model captures both spine loss and recovery, fitting well with lesion-induced degeneration and recovery. Does this mean that the model captures other types of plasticity, or does it suggest to the authors that both steps are homeostatic?

      Indeed, the biphasic HSP rule captures two types of activity dependence. The pioneering work by Gallinaro and Rotter (2018) also demonstrated that the HSP rule, even in its monotonic/linear form, exhibits associative properties, which are typically associated with Hebbian plasticity.

      Fig. 6A: This figure requires a more detailed legend - what are the various insets? Does the top right graph only have one curve because they are overlapping and the growth rules are the same for axons and dendrites?

      Local text revision: fixed.

      Fig. 6E: There is usually an overshoot when a stimulus is removed, in this case at the end of the silencing period (as shown in Fig. 1Aiii). Is there a reason why this is not recapitulated here? It shouldn't be as extreme as in the right panel so there should be no degeneration.

      We agree that removing the stimulus would typically trigger an opposite homeostatic process. However, in this protocol, we aimed to emphasize the role of recurrency by presenting extreme cases to illustrate potential scenarios for the readers.

      Local text revision: We revised this paragraph to walk the readers through the rationale better.

      Fig. 6: the authors mention distance-dependent connectivity (line 268), but I couldn't find any data related to that statement. I was particularly curious about that aspect, so I would like to know what this statement is based on, especially as they touch again on the role of morphology in Fig. 8, and distance-dependent connectivity is more prominent in the discussion. On a similar note, would the authors have data from other layers of CA1 that would show similar or other rules? Please note that I am not asking to include these data in the present paper - I am just curious if these data exist (or if the experiments are considered).

      Such an extensive dataset is included and thoroughly investigated in another study that has just been published in Lenz et al., 2023. We updated the reference in the revised text.

      Fig. 7E top: the scalebar is missing.

      Local text revision: fixed.

      Fig. 8A: do the colors have meaning? If yes, please state them. Also indicate that the left two neurons are pyramidal cells from CA1 and the right neurons are granule cells from the dentate gyrus.

      Local text revision: fixed.

      Line 302: "reactive" should be "reactivate".

      Local text revision: fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a previously used anabolic therapy. The authors have achieved the aims of the study. Their conclusion, however, that this suggests a "new path of therapeutic PTH analog development" seems unfounded; the benefit of this PTH variant is not clear, but the work is still interesting.

      The work does not identify why the patient with this mutation has hypocalcemia and hyperphosphatemia; this was not the goal of the study, but the data are useful for helping to understand that.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Weaknesses:

      (1) The use of very young, 8-10 week old, mice as a model of postmenopausal osteoporosis is a major limitation of this study. At 8 weeks, the effect of ovariectomy leads to lack of new trabecular bone formation, rather than trabecular bone loss due to a defect in bone remodelling. Although the findings here provide a comparison between two forms of PTH, it is unlikely to be of direct relevance to the patient population. For example, the authors find an inhibitory effect of PTH on osteoclast surface, which is very unusual. Adding to this concern is that the authors have not described the regions used for histomorphometry, and from their figures (particularly the TRAP stain), it seems that the primary spongiosa (which is a region of growth) has been used for histomorphometry, rather than the secondary spongiosa (which more accurately reflects bone remodelling). Much further detail is needed to justify the use of this very young model, and a section on the limitations of this model is needed. Please provide that section in the revised manuscript.

      Thank you for your crucial comment. We obtained 8-week-old female mice and stabilized them in our facility for 2 weeks. Then, we performed OVX using 10-week-old mice and determined the effects of dimeric <sup>R25C</sup>PTH(1-34) on bone after 8 weeks because of 4 weeks for recovery and 4 weeks for PTH or <sup>R25C</sup>PTH(1-34). Therefore, we sacrificed the mice at 18-week-old mice. We revised the method section on page 18, line 436-441 and page 18, line 442-448 as follows.

      - ‘Eight-week-old C57BL/6N female mice were purchased from KOATECH (Gyeonggi-do, Republic of Korea), and stabilized mice for 2 weeks. All animal care and experimental procedures were conducted under the guidelines set by the Institutional Animal Care and Use Committees of Kyungpook National University (KNU-2021-0101). The mice were housed in a specific pathogen-free environment, with 4-5 mice per cage, under a 12-h light cycle at 22 ± 2°C. They were provided with standard rodent chow and water ad libitum.’

      - ‘An ovariectomized (OVX) mouse model was established using 10-week-old C57BL/6N female mice. Following surgery, mice were divided into the following four groups (n = 6 mice/group) as follows: sham, OVX control group, OVX + PTH (1–34) treated group (40 µg/kg/day), and OVX + dimeric <sup>R25C</sup>PTH treated group (40-80 µg/kg/day). OVX mice were allowed to recover for 4 weeks after surgery. Afterward, PTH (1–34) or <sup>R25C</sup>PTH was injected subcutaneously 5 times a week for 4 weeks. Micro-computed tomography (μ-CT) and histological analyses were performed on 4 groups at 18 weeks of age.’

      We also appreciate the reviewer's helpful comment on histology analysis. We agree with the reviewer’s comment that the primary spongiosa does not fully reflect bone remodeling. For histomorphometry analysis in young or male mice, we commonly use the secondary spongiosa, which more accurately reflects bone remodeling. However, in aged or OVX-induced osteoporosis mouse models, we use the primary and secondary spongiosa for histomorphometry analysis because of the barely detectable bone in the secondary spongiosa. In the TRAP staining, we observed an inhibitory effect of PTH on the osteoclast surface/bone surface, which was due to an increased bone surface in the PTH treatment group and less bone in the OVX-vehicle group. Serum CTX1 levels showed no significant difference between the OVX+vehicle and OVX+PTH(1-34) groups. We revised the Materials and Methods (page 21, line 502) and Discussion (page 14, line 330) sections as follows.

      - ‘In the histomorphometry analysis for TRAP staining, we used the secondary and primary spongiosa for the trabecular ROI because of the barely detectable in the secondary spongiosa of OVX model.’

      - ‘This study has several limitations. First, it is urgently necessary to determine whether dimeric <sup>R25C</sup>PTH is present in human patient serum. Second, TRAP staining showed an inhibitory effect of PTH treatment on the primary spongiosa area. However, the secondary spongiosa, which more accurately reflects bone remodeling (55), was not examined due to the barely detectable bone in this area in OVX-induced osteoporosis mouse models. Third, it is unclear whether similar bone phenotypes exist between human <sup>R25C</sup>PTH patients and dimeric <sup>R25C</sup>PTH-treated mice, particularly regarding low bone strength. Although the dimeric <sup>R25C</sup>PTH-treated group showed higher cortical BMD compared to WT-Sham or PTH groups, there was no difference in bone strength compared to the osteoporotic mouse model. Fourth, our study showed that PTH or <sup>R25C</sup>PTH treatment decreased circumferential length; it is uncertain if this phenotype is also present in PTH-treated or <sup>R25C</sup>PTH patients. Finally, we did not analyze the <sup>R25C</sup>PTH mutant mouse model, which would allow us to compare phenotypes that most closely resemble those of human patients.’

      (2) It is also somewhat concerning that the age range is from 8-10 weeks, increasing the variability within the model. Did the age of mice differ between the groups analysed?

      We utilized mice of the same age (10 weeks) across all experiments involving the surgically induced ovariectomy (OVX) model described as above.

      (3) Methods are not sufficiently detailed. For example, the regions used for histomorphometry are not described, there is no information on micro-CT thresholds, no detail on the force used for mechanical testing. Please address this request.

      Thank you for your comment. Let me address your points step by step.

      (1) Thresholds for analysis were determined manually based on grayscale values for each experimental group as follows: trabecular bone: 3000; cortical bone: 5000 for all samples. We utilized an HA (calcium hydroxyapatite) phantom with HA content ranging from 0 to 1200 mg CaHA/cm³ to measure the grayscale values via µ-CT. These measurements were then used to generate a standard curve.

      Author response image 1.

      (2) Bone parameters and density were analyzed in the region between 0.3–1.755 mm (Voxel size: 9.7um, 150 slices) from the bottom of the growth plate. Analysis of bone structure was performed using adaptive thresholding in a CT Analyser.

      Author response image 2.

      (3) Three‐point bending test, the left femur of the mouse was immersed in 0.9 % NaCl solution, wrapped in gauze, and stored at −20°C until ready for a three-point bending test. In this test, we placed the mouse femurs positioned horizontally with the anterior surface facing upwards, centered on the supports, and the compressive force was applied vertically to the mid-shaft. The pressure sensor was positioned at a distance that allowed for the maximum allowable pressure (200N) without interfering with the test (20.0 mm for the femur). A miniature material testing machine (Instron, MA, USA) was used for this test. The crosshead speed was decreased to 1 mm/min until failure. During the test, force-displacement data were collected to determine the maximum load and slope of the bones.

      (4)  As the reviewer’s suggestion, we revised the methods on page 20, line 477 and line 482-486 as follows.

      - ‘Bone parameters and density were analyzed in the region between 0.3–1.755 mm (150 slices) from the bottom of the growth plate. Analysis of bone structure was performed using adaptive thresholding in a µ-CT Analyser. Thresholds for analysis were determined manually based on grayscale values for each experimental group: trabecular bone: 3000; cortical bone: 5000 for all samples.’

      -  ‘The left femur of the mouse was immersed in 0.9 % NaCl solution, wrapped in gauze, and stored at −20°C until ready for a three-point bending test. In this test, we placed the mouse femurs horizontally with the anterior surface facing upwards, centered on the supports, and the compressive force was applied vertically to the mid-shaft. The pressure sensor was positioned at a distance that allowed maximum allowable pressure (1000N) without interfering with the test (20.0 mm for the femur). A miniature material testing machine (Instron, MA, U.S.A.) was used for this test. The crosshead speed was decreased to 1 mm/min until failure. During the test, force-displacement data were collected to determine the maximum load and slope of the bones.’

      (4) There are three things unclear about the calvarial injection mouse model. Firstly, were the mice injected over the calvariae or with a standard subcutaneous injection (e.g. at the back of the neck)? If they were injected over the calvaria, why were both surfaces measured? Secondly, why was the dose of the R25C-PTH double that of PTH(1-34)? Thirdly, there is no justification for the use of "more intense coloration" as a marker of new bone; this requires calcein labelling to prove it new bone. It would be more reliable to measure and report the thickness of the calvaria. Please address these technical questions.

      Thank you for your valuable feedback on the calvarial injection mouse model. Below are our responses to the specific points mentioned:

      (1) Injection method and measurement sites: The injections were administered subcutaneously above the calvaria, rather than at the standard subcutaneous site such as the back of the neck. This approach was chosen to ensure direct delivery of the peptide to the target area, enhancing the localized effects on bone formation. Measurements were taken at two different parts of the calvaria to account for any variation in the spread and absorption of the administered substance following injection. By analyzing both surfaces, we aimed to provide a comprehensive assessment of the impact on calvarial bone thickness.

      (2) Dose of <sup>R25C</sup>PTH compared to PTH(1-34): The dose of <sup>R25C</sup>PTH used in our study was determined based on molecular weight calculations. The molecular weight of the dimeric <sup>R25C</sup>PTH(1-34) is approximately twice that of the monomeric PTH(1-34). Therefore, to maintain a consistent molar concentration and ensure comparable biological effects, the dose of <sup>R25C</sup>PTH was adjusted accordingly.

      (3) Use of "more intense coloration" as a marker of new bone: We acknowledge that calcein labeling would provide a more reliable and quantifiable way to identify new bone formation. The use of “more intense coloration” was intended as a qualitative indicator in this study, and we recognize the technical limitations of this approach.

      (5) The presentation of mechanical testing data is not sufficient. Example curves should be shown, and data corrected for bone size needs to be shown. The difference in mechanical behaviour is interesting, but does it stem from a difference in the amount of bone, or two a difference in the quality of the bone? Please explain this matter better in the manuscript.

      Thank you for your comment.

      As a reviewer's comment, we provided example curves for the rat femur three-point bending test as shown below.

      Author response image 3.

      (1) The cortical bone area was decreased in the OVX-Vehicle and OVX-<sup>R25C</sup>PTH(1-34) groups but not in the OVX-PTH(1-34) group compared to the Sham group. However, the total bone area was decreased in the PTH(1-34) and <sup>R25C</sup>PTH(1-34) treated groups, with no significant difference in the OVX-Vehicle group compared to the Sham group. Collectively, there was an increase in cortical thickness which resulted in a narrowing of the bone marrow space in OVX-<sup>R25C</sup>PTH(1-34) groups. Accordingly, we revised Fig 5B with the addition of Tt.Ar and Ct.Ar.

      (2) As the reviewer’s suggestion, we revised the results on page 10, line 220-228 s follows.

      - ‘Quantitative micro-computed tomography (μ-CT) analysis of the femurs obtained from each group revealed that, as compared to OVX + vehicle controls, treatment with PTH(1–34) increased femoral trabecular bone volume fraction (Tb.BV/TV) by 121%, cortical bone volume fraction (Ct.BV/TV) by 128%, cortical thickness (Ct.Th) by 115%, cortical area (Ct.Ar) by 110%, and cortical area fraction (Ct.Ar/Tt.Ar) by 118% while decreased total tissue area (Tt.Ar) by 93% (Figure 5A and 5B). Treatment with dimeric <sup>R25C</sup>PTH(1-34) had similar effects on the femoral cortical bone parameters, as it increased Ct.BMD by 104%, Ct.BV/TV by 125%, Ct.Th by 107%, and Ct.Ar/Tt.Ar by 116%, while decreased Tt.Ar 86% (Figure 5). Considering the reduction of Tt.Ar and no change of Ct.Ar compared to the OVX+vehicle controls, the increase of Ct.Ar/Tt.Ar indicates a decrease in bone marrow space. The increase in cortical bone BMD was significant with dimeric <sup>R25C</sup>PTH(1-34) but not with PTH(1-34), whereas an increase in femoral trabecular bone was only observed with PTH(1-34).’

      (6) The micro-CT analysis of the cortical bone in the OVX model is insufficient. Please indicate whether cross-sectional area has increased. Is there an increase in the size of the bones, or is the increase in cortical thickness due to a narrowing of the marrow space? This may help resolve the apparent contradiction between the cortical thickness data (where there is no difference between the two PTH formulations) and the mechanical testing data (where there is a difference). Please explain this matter better in the manuscript.

      Thank you for your comment.

      (1) The cortical bone area was decreased in the OVX-Vehicle and OVX-<sup>R25C</sup>PTH(1-34) groups but not in the OVX-PTH(1-34) group compared to the Sham group. However, the total bone area was decreased in the PTH(1-34) and <sup>R25C</sup>PTH(1-34) treated groups, with no significant difference in the OVX-vehicle group compared to the Sham group. Taken together, there was an increase in cortical thickness due to a narrowing of the bone marrow space in OVX-<sup>R25C</sup>PTH(1-34) groups. Therefore, we revised as above.

      (2) As the reviewer’s suggestion, we revised the results on page 10, line 220-228 as follows.

      - ‘Quantitative micro-computed tomography (μ-CT) analysis of the femurs obtained from each group revealed that, as compared to OVX + vehicle controls, treatment with PTH(1–34) increased femoral trabecular bone volume fraction (Tb.BV/TV) by 121%, cortical bone volume fraction (Ct.BV/TV) by 128%, cortical thickness (Ct.Th) by 115%, cortical area (Ct.Ar) by 110%, and cortical area fraction (Ct.Ar/Tt.Ar) by 118% while decreased total tissue area (Tt.Ar) by 93% (Figure 5A and 5B). Treatment with dimeric <sup>R25C</sup>PTH(1-34) had similar effects on the femoral cortical bone parameters, as it increased Ct.BMD by 104%, Ct.BV/TV by 125%, Ct.Th by 107%, and Ct.Ar/Tt.Ar by 116%, while decreased Tt.Ar 86% (Figure 5B). Considering the reduction of Tt.Ar and no change of Ct.Ar compared to the OVX+vehicle controls, the increase of Ct.Ar/Tt.Ar indicates a decrease in bone marrow space. The increase in cortical bone BMD was significant with dimeric <sup>R25C</sup>PTH(1-34) but not with PTH(1-34), whereas an increase in femoral trabecular bone was only observed with PTH(1-34).’

      (7) The evidence that dimeric PTH has a different effect to monomeric PTH is very slim; I am not sure this is a real effect. Such differences take a long time to sort out (e.g. the field is still trying to determine whether teriparatide and abaloparatide are different). I think the authors need to look more carefully at their data - almost all effects are the same. Ultimately, the statement that dimeric PTH may be a more effective anabolic therapy than monomeric PTH are not supported by the data, and this should be removed. There is little to no difference found between normal PTH and the variant in their effects on calcium and phosphate homeostasis or on bone mass. However, the analysis has been somewhat cursory, with insufficient mechanical testing or cortical data presented. Many of the effects seem to be the same (e.g. cortical thickness, P1NP, ALP, vertebral BV/TV and MAR), but the way it is written it sounds like there is a difference. Please remove some of the unfounded claims that you have made in this manuscript.

      Thank you for your insightful comments. We strongly agree with your conclusion that PTH and dimeric <sup>R25C</sup>PTH indeed exhibit similar activities. We have toned-down our statement, however, there are still some elements showing statistical significance that need to be clearly stated. Specifically, when we changed the statistical method from t-test to one-way ANOVA, the significance of bone formation markers were only observed in dimeric PTH treated samples, and we have revised the manuscript of Results section on page 9, line 206-212 as follows to reflect the change.

      - ‘These analyses revealed that both PTH(1-34) and dimeric <sup>R25C</sup>PTH(1-34) significantly increased the width of the new bone area by approximately four-fold, as compared to the vehicle group (Figure 4B). These findings thus support a capacity of dimeric <sup>R25C</sup>PTH(1-34) to induce new bone formation in vivo, similar to PTH, despite molecular and structural changes.’

      Although it is unclear whether <sup>R25C</sup>PTH circulate as dimeric form or mutant monomeric form, the absence of bone resorption associated with long-term PTH exposure in the patients suggests the potential for a bone anabolic drug without side effects. Also, continued observation of the recently reported young patient in Denmark is expected to clarify this effect further. However, we acknowledge that our current data alone are insufficient to claim that <sup>R25C</sup>PTH may be a more effective anabolic therapy than wild type PTH, and we have adjusted our tone accordingly.

      (8) Statistical analysis used multiple t-tests. ANOVA would be more appropriate.

      We agree with your suggestion. To compare the means among three or more groups, ANOVA is more appropriate than the t-test. Accordingly, we performed new statistical analyses using one-way and two-way ANOVA. One-way ANOVA was applied to figure 4, 5, and 6 (In previous, figure 5, 6, and 7), and two-way ANOVA was applied to Figure 3, considering both time and treatment variables. We revised some of the figures and descriptions to reflect the changes in significance.

      Thank you for Reviewer #1’s thorough and thoughtful review. We greatly appreciate the suggestions and will incorporate them to enhance the quality of our paper.

      Reviewer #2 (Public Review):

      Summary:

      The study conducted by Noh et al. investigated the effects of parathyroid hormone (PTH) and a dimeric PTH peptide on bone formation and serum biochemistry in ovariectomized mice as a model for postmenopausal osteoporosis. The authors claimed that the dimeric PTH peptide has pharmacological benefits over PTH in promoting bone formation, despite both molecules having similar effects on bone formation and serum Ca2+. However, after careful evaluation, I am not convinced that this manuscript adds a significant contribution to the literature on bone and mineral research.

      Strengths:

      Experiments are well performed, but strengths are limited to the methodology used to evaluate bone formation and serum biochemical analysis.

      Weaknesses:

      (1) Limited significance of this study:

      • This study follows a previous study (not cited) reporting the effect of the dimeric R25CPTH(1-34) on bone regeneration in an osteoporotic dog (Beagle) model (Jeong-Oh Shin et al., eLife 13:RP93830, 2024). It's unclear why the authors tested the dimeric R25C-PTH peptide on a rodent animal model, which has limitations because the healing mechanism of human bone is more similar in dogs than in mice.

      Thank you for your interest in our research. To address the paper by Shin et al. (2024, DOI:10.7554/eLife.93830.1), we would like to clarify that our research on dimeric <sup>R25C</sup>PTH(1-34) was conducted first. Initially, we confirmed dimerization under in vitro conditions and observed its effects in a mouse model. Recognizing the need for additional animal models, we collaborated with Shin et al.'s team. Due to delays during the submission process, our paper was submitted later, which seems to have led to this misunderstanding. However, Shin et al. (2024) cited our pre-print article on bioRxiv (Noh, M., Che, X., Jin, X., Lee, D. K., Kim, H. J., Park, D. R., ... & Lee, S. (2024). Dimeric R25CPTH (1-34) Activates the Parathyroid Hormone-1 Receptor in vitro and Stimulates Bone Formation in Osteoporotic Female Mice. bioRxiv, 2024-03.DOI: 10.1101/2024.03.13.584815). Both Shin et al., and our mouse work supports the action of dimeric R25CPTH(1-34) on regulating bone metabolism.

      • The authors should clarify why they tested the effects of dimeric <sup>R25C</sup>PTH(1-34) and not dimeric <sup>R25C</sup>PTH(1-84)?

      Thank you for your valid comments. Here are several reasons why we used the 1-34 fragment peptide in our experiment. Currently, PTH analog peptides for medical purposes include human parathyroid hormone fragment 1-34 (PTH(1-34)) and full-length recombinant human parathyroid hormone (rhPTH(1-84)). PTH(1-34) is used as a bone anabolic agent, while rhPTH(1-84) is used for PTH replacement therapy in hypoparathyroid patients with hypocalcemia. We aimed to compare the bone formation effects of R25CPTH with wild-type PTH, for which PTH(1-34) was deemed more appropriate. Additionally, previous studies have shown that both PTH(1-34) and PTH(1-84) possess equal ligand binding affinity for the PTH1 receptor. Key sites within the first 34 N-terminal amino acids of PTH are critical for high-affinity interactions and receptor activation. Alterations in the N-terminal sequence of PTH(1-84) significantly reduce receptor binding, while truncations at the C-terminal end do not affect receptor affinity. The peptide used in our experiment was synthetic, and if the length does not affect affinity to its receptor affinity, the shorter length of PTH(1-34) made its synthesis more reasonable. Consequently, we tested the effects of PTH(1-34) and dimeric R25CPTH(1-34) due to its known efficacy on bone anabolic effect and relevance in receptor interactions. However, we aim to conduct functional analysis of the dimeric R25CPTH(1-84) in further study.

      • The study is descriptive with no mechanism.

      We recognize that your concern is legitimate. While our study includes descriptive elements, it extends beyond mere observation. The R25CPTH research, which began with a case report, has evolved to utilize molecular techniques to better understand the unique physiological phenomena observed in patients. We have validated the peptide’s dimerization caused by mutations in vitro and assessed their effects in both in vitro cell line models and in vivo mouse models. Although we have not yet confirmed whether <sup>R25C</sup>PTH exists as a dimer or monomer in patient blood, we anticipate it may exist in dimeric form at least some fractions and are currently conducting mass spectrometry on patient blood samples to determine this. Therefore, this paper serves as the first report on this PTH mutant suggesting that it may form a homodimer. Importantly, we are actively investigating the molecular mechanisms and downstream signaling pathways that differentiate normal PTH from dimeric <sup>R25C</sup>PTH. This includes analyzing differences in proteome and transcriptome induced by PTH and dimeric <sup>R25C</sup>PTH and examining the direct molecular characteristics and structural changes responsible for these mutations. Through this comprehensive approach, we aim to provide a detailed mechanistic understanding of <sup>R25C</sup>PTH in the subsequent publication.

      (2) Statistics are inadequately described or performed for the experimental design:

      • The statistical analysis in Figure 5 needs to be written in a way that makes it clearer how statistics were done; t-test or one-way ANOVA?

      Sorry for the inconvenience and thank you for your thorough review. Initially, we conducted the statistical analysis using a t-test. However, during the revision process, we performed a new statistical analysis using one-way ANOVA, as it is more appropriate for comparing the means among three or more groups. Despite this change, there were no differences in statistical significance, so the descriptions remained unchanged.

      • Statistics in Figures 6 and 7 should be performed by one-way ANOVA to compare the mean values of one variable among three or more groups, and not t-test.

      Thank you for your thorough review, and I apologize for any inconvenience. I agree with your suggestion that ANOVA is more appropriate than the t-test for comparing means among three or more groups. Accordingly, we performed new statistical analyses using one-way ANOVA. When we changed the statistical method from t-test to one-way ANOVA, the significance of bone formation markers, P1NP and ALP, appeared only in dimeric R25CPTH and not in wild-type PTH. We have reflected these findings in the text.

      (3) Misleading and confused discussion:

      • The first paragraph lacks clarity in the PTH nomenclature and the authors should provide a clear statement that the PTH mutant found in patients is likely a monomeric R25CPTH(1-84), considering that there has been no proof of a dimeric form.

      Thank you for your insightful comments. I agree that there was some ambiguity in the nomenclature used in the first paragraph of the Discussion section. However, we do not believe that no proof of a dimeric form of the <sup>R25C</sup>PTH(1-84) mutant necessarily indicates that the PTH mutant in the blood is solely monomeric. Identifying the in vivo structure of <sup>R25C</sup>PTH(1-84) is one of the goals of our ongoing project. While the exact form of <sup>R25C</sup>PTH(1-84) in patients is still elusive, we are investigating the possibility that some fraction may exist as a dimer. On page 12, line 274-276, we have revised the content to address this issue and improve clarity as follows.

      - ‘In this study, we show the introduction of a cysteine mutation at the 25th amino acid position of mature parathyroid hormone (<sup>R25C</sup>PTH) facilitates the formation of homodimers comprised of the resulting dimeric R25CPTH peptide in vitro.’

      • Moreover, the authors should discuss the study by White et al. (PNAS 2019), which shows that there are defective PTH1R signaling responses to monomeric R25CPTH(1-34). This results in faster ligand dissociation, rapid receptor recycling, a short cAMP time course, and a loss of calcium ion allosteric effect.

      Sorry for the inconvenience and thank you for your thorough review. The authors were aware of the referenced paper and deeply apologize for its omission during the writing and editing process. Citing this paper will enhance the credibility of our findings. We have now included this citation and made the necessary adjustments to the manuscript of Discussion section on page 12, line 295-296 as follows.

      - ‘We also observed that the potency of cAMP production in cells was lower for dimeric <sup>R25C</sup>PTH as compared to the monomeric <sup>R25C</sup>PTH, in accordance with a lower PTH1R-binding affinity. Previous reports indicated that a mutation at the 25th position of PTH results in the loss of calcium ion allosteric effects on monomeric <sup>R25C</sup>PTH, leading to faster ligand dissociation, rapid receptor recycling, and a shorter cAMP time course (50). Correspondingly, the weaker receptor affinity and reduced cAMP production observed in dimeric <sup>R25C</sup>PTH suggest a possibility that the formation of a disulfide bond at the 25th position significantly alters the function of PTH as a PTH1R ligand. These structural effects are not yet fully understood and need to be investigated further.’

      • The authors should also clarify what they mean by "the dimeric form of R25CPTH can serve as a new peptide ...(lines 328-329)" The dimeric R25CPTH(1-34) induces similar bone anabolic effects and calcemic responses to PTH(1-34), so it is unclear what the new benefit of the dimeric PTH is.

      We apologize for any confusion in our previous description. We concur that, as you mentioned, PTH and dimeric <sup>R25C</sup>PTH indeed exhibit similar activities. We have toned-down our statement, however, there are still some elements showing statistical significance that need to be clearly stated. Specifically, when we changed the statistical method from t-test to one-way ANOVA, the significance of bone formation markers was only observed in dimeric PTH treated samples, and we have revised the manuscript of Results section on page 9, line 206-212 as follows to reflect the change.

      - ‘These analyses revealed that both PTH(1-34) and dimeric <sup>R25C</sup>PTH(1-34) significantly increased the width of the new bone area by approximately four-fold, as compared to the vehicle group (Figure 4B). These findings thus support a capacity of dimeric <sup>R25C</sup>PTH(1-34) to induce new bone formation in vivo, similar to PTH, despite molecular and structural changes.’

      Although it is unclear whether <sup>R25C</sup>PTH circulate as dimeric form or mutant monomeric form, the absence of bone resorption associated with long-term PTH exposure in the patients suggests the potential for a bone anabolic drug without side effects. Also, continued observation of the recently reported young patient in Denmark is expected to clarify this effect further. However, we acknowledge that our current data alone are insufficient to claim that <sup>R25C</sup>PTH may be a more effective anabolic therapy than wild type PTH, and we have adjusted our tone accordingly.

      Thank you for Reviewer #2’s comprehensive and considerate review. We are grateful for the ideas, and we have revised our manuscript accordingly them to improve our paper.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1D lacks molecular weight markers.

      Thank you for your thorough review. We added protein molecular weight markers in the figure.

      (2) The lack of change in plasma cAMP is very surprising, particularly given that there is no difference in the effect of the two forms of PTH on serum calcium or phosphate, or urinary phosphate. This data is somewhat of a distraction since no effort has been made to assess the difference in the effects of these PTH forms on kidney function. I suggest removing this data and spending time working on the origin of this difference.

      Thank you for your insightful comments and valuable suggestions on our manuscript. We also could not precisely explain the discrepancy between the cell line and animal model experiments. However, since the results were consistently observed, we included them in the paper as they may be significant. We acknowledge that in the context of our current research, these data lack sufficient correlation with other findings. Therefore, we have removed the data about the lack of change in plasma cAMP by PTH injection (Figure 4. Effect of cAMP production by PTH injection in CD1 female mice) and revised the manuscript accordingly (Page 8, line 188-194; page 12, line 301-306; page 19, line 454-456). We are currently conducting further research with multiomics data analysis to elucidate potential differences in the sub-signaling pathways between PTH and dimeric R25CPTH, to identify the specific functions affected by these variations, and to understand the underlying mechanisms. The lack of changes in plasma cAMP levels in vivo will be addressed in a subsequent publication detailing our findings.

      (3) Introduction, line 61. The authors state that "most" anti-resorptive therapies cannot stimulate new bone formation. I don't believe that ANY anti-resorptive therapies stimulate new bone formation! If there is one, this should be referenced.

      Thank you for pointing out important aspects. Romosozumab, a humanized monoclonal anti-sclerostin antibody, has a dual effect by enhancing bone formation and inhibiting bone resorption. Sclerostin, a protein produced by osteocytes, plays a role in the regulation of bone metabolism. It promotes osteoclast differentiation, which is associated with bone resorption, and suppresses osteoblast activity, which is crucial for bone formation. By binding to sclerostin, Romosozumab prevents it from blocking the signaling pathways necessary for osteogenesis. Consequently, Romosozumab therapy not only regulates bone resorption but also affects new bone formation. We added the references to that information.

      (4) The authors tend to include a lot of methods in the results section (e.g. describing the number of replicates, and details of histological analysis). This should be minimized.

      Thank you for your thorough review, and sorry for the inconvenience. We have minimized the methodological details in the results section, ensuring that only essential information for understanding the findings and the procedures remain.

      (5) Lines 302-305: If retaining the blood cAMP data, please provide references for the assertion that renal PTH receptors mediate this response.

      PTH exerts its effects primarily through the PTH1 receptor (PTH1R), a G protein-coupled receptor present in various tissues, including bone and kidney (Chase et al., 1968, Chase et al., 1970). When activated by PTH, this receptor stimulates the production of cyclic AMP (cAMP), with the kidneys playing a significant role in this process (Maeda et al., 2013). In the initial manuscript, the importance of renal PTH receptors in mediating the blood cAMP response may have been overemphasized. We appreciate your feedback on this point, and we have provided references to support this assertion. However, by process following the former ‘Recommendations for the Authors’, we removed the data about the lack of change in plasma cAMP by PTH injection, the description of the renal PTH receptors mediate this response of blood cAMP also removed.

      - Chase, Lewis R., and G. D. Aurbach. "Renal adenyl cyclase: anatomically separate sites for parathyroid hormone and vasopressin." Science 159.3814 (1968): 545-547.DOI:10.1126/science.159.3814.545

      - Chase, Lewis R., and G. D. Aurbach. "The effect of parathyroid hormone on the concentration of adenosine 3', 5'-monophosphate in skeletal tissue in vitro." Journal of Biological Chemistry 245.7 (1970): 1520-1526.DOI:10.1016/S0021-9258(19)77126-9

      - Maeda, Akira, et al. "Critical role of parathyroid hormone (PTH) receptor-1 phosphorylation in regulating acute responses to PTH." Proceedings of the National Academy of Sciences 110.15 (2013): 5864-5869.DOI: 10.1073/pnas.1301674110

      (6) Eosin stains bone pink and haematoxylin stains cells purple. This has been incorrectly described in the manuscript.

      Thank you for your thorough review, and I apologize for any confusion caused by the poor description. It appears that the terms were used interchangeably during the editing process. We have corrected the description in the manuscript and will ensure such mistakes do not occur again in the future.

      (7) Sodium thiosulphate is a fixative for Von Kossa staining, not an agent that removes nonspecific binding.

      Thank you for your careful review. However, there seems to be a misunderstanding of sodium formaldehyde as sodium thiosulfate. A 5% sodium thiosulfate solution is a critical in vitro diagnostic agent used in various staining kits. As a reducing agent, it effectively removes excess silver ions in staining kits based on silver impregnation techniques. In our experiment, sodium thiosulfate was specifically used to remove residual silver ions in Von Kossa staining. For more details, please refer to the following link: https://www.morphisto.de/en/shop/detail/d/Natriumthiosulfat_5//12825/.

      Reviewer #2 (Recommendations For The Authors):

      Moderate-to-Minor points:

      • Line 73: it's either class B GPCR or secretin receptor family but not class B GPCR family.

      Thank you for your thorough review, and I apologize for any confusion in our previous description. We corrected the description in the manuscript as class B GPCR.

      • Line 79: correct "adenylate cyclase" to "transmembrane adenylate cyclases"

      Thank you for your thorough review, and I apologize for any confusion in our previous description. We corrected the description in the manuscript as transmembrane adenylate cyclases.

      • Line 89: should "hypothyroidism" be "hypoparathyroidism"?

      Thank you for your thorough review, and I apologize for any confusion in our previous description. We corrected the description in the manuscript as hypoparathyroidism.

      • Line 159: all agonists display higher binding affinities when their receptors are coupled to G proteins, so it's unclear why the higher affinity of the dimeric <sup>R25C</sup>PTH(1-34) for the RG state seems to be important for the authors.

      Thank you for your insightful comments. First of all, comparing the binding affinities of the R0 (G protein-uncoupled) and RG (G protein-coupled) conformations of the receptor is inappropriate. This is because the form and size of the radio-label ligand bound to each conformation differ, which consequently affects their binding affinities and, in turn, influences the binding strength of target ligands such as PTH, monomeric <sup>R25C</sup>PTH, and dimeric <sup>R25C</sup>PTH. Therefore, it is preferable to compare how the binding strengths of test ligands differ for each conformation. Additionally, the fact that significant binding affinity is lost for R<sup>0</sup> while remaining high for the RG conformation of PTH1R is important because typical PTH exhibits high binding affinity for R0, whereas PTHrP shows higher affinity for the RG conformation. This suggests that dimeric <sup>R25C</sup>PTH may possess distinct molecular characteristics and potentially induce different downstream signaling pathways compared to typical PTH.

      • Line 169-170 and Fig. 2: According to the theory of receptor pharmacology established in the 60s' for native receptors (Arch. Int. Pharmacodyn. 127:459-478 (1960); Arch. Int. Pharmacodyn. 136:385-413 (1962)) and verified later in the 80-90's for recombinant GPCRs, the activity constant (Kact or EC50) value of hormone actions in various tissues or cells is equal to the dissociation constant (Kd) of the hormone when receptors are not overexpressed (EC50 = Kd). When receptors are overexpressed (presence of spare receptors), then EC50 < Kd. Assuming that after Cheng-Prussof correction for data in Fig. 2, IC50 < Ki = Kd, how do the authors explain that IC50 values for RG are about 1-Log lower than EC50s (i.e., EC50 > Kd)?

      We appreciate your insightful comment and fully acknowledge the established theory of receptor pharmacology, which states that Kd equals EC50, and when the receptor is overexpressed, EC50 is less than Kd. After having read your comments, we have revisited this paper Okazaki et al, PNAS, 2008 to better understand the PTH interaction with PTH1R. While our data might appear to contradict this theory, we believe that a direct comparison between the IC50 of RG and the EC50 in Figure 2 may not be entirely appropriate for the following reasons. First, the IC50 was determined from membrane preparations of a receptor-overexpressing cell line (GP-2.3), whereas the EC50 was calculated based on the cAMP response in SaOS-2 cells. These different experimental conditions contribute to the observed discrepancies. Second, the peptides used in the competition assays differ. R<sup>0</sup> utilized radiolabeled PTH(1-34), while RG employed M-PTH(1-15) with several amino acid substitutions and a shorter length. This further complicates a direct comparison between the EC50 and IC50 values in our study.

      Thank you for all the reviewers’ thorough and thoughtful reviews. We greatly appreciate your suggestions and have addressed all the issues to enhance the quality of our paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Aging reduces tissue regeneration capacity, posing challenges for an aging population. In this study, the authors investigate impaired bone healing in aging, focusing on calvarial bones, and introduce a two-part rejuvenation strategy. Aging depletes osteoprogenitor cells and reduces their function, which hinders bone repair. Simply increasing the number of these cells does not restore their regenerative capacity in aged mice, highlighting intrinsic cellular deficits. The authors' strategy combines Wnt-mediated osteoprogenitor expansion with intermittent fasting, which remarkably restores bone healing. Intermittent fasting enhances osteoprogenitor function by targeting NAD+ pathways and gut microbiota, addressing mitochondrial dysfunction - an essential factor in aging. This approach shows promise for rejuvenating tissue repair, not only in bones but potentially across other tissues.

      Strengths:

      This study is exciting, impressive, and novel. The data presented is robust and supports the findings well.

      Weaknesses:

      As mentioned above the data is robust and supports the findings well. I have minor comments only.

      We thank the reviewer for their enthusiastic and positive assessment of our study. We appreciate the recognition of the novelty and robustness of our data and findings. We have carefully considered the reviewer's comments and have revised the manuscript accordingly. We believe these revisions further strengthen the clarity and impact of our work.

      Reviewer #2 (Public review):

      Summary:

      Reeves et al explore a model of bone healing in the context of aging. They show that intermittent fasting can improve bone healing, even in aged animals. Their study combines a 'bone bandage' which delivers a canonical Wnt signal with intermittent fasting and shows impacts on the CD90 progenitor cell population and the healing of a critical-sized defect in the calvarium. They also explore potential regulators of this process and identify mitochondrial dysfunction in the age-related decline of stem cells. In this context, by modulating NAD+ pathways or the gut microbiota, they can also enhance healing, hinting at an effect mediated by complex impacts on multiple pathways associated with cellular metabolism.

      Strengths:

      The study shows a remarkable finding: that age-related decreases in bone healing can be restored by intermittent fasting. There is ample evidence that intermittent fasting can delay aging, but here the authors provide evidence that in an already-aged animal, intermittent fasting can restore healing to levels seen in younger animals. This is an important finding as it may hint at the potential benefits of intermittent fasting in tissue repair.

      Weaknesses:

      The authors explore potential mechanisms by which the intermittent fasting protocol might impact bone healing. However, they do not identify a magic bullet here that controls this effect. Indeed, the fact that their results with intermittent fasting can be replicated by changing the gut microbiota or modulating fundamental pathways associated with NAD, suggests that there is no single mechanism that drives this effect, but rather an overall complex impact on metabolic processes, which may be very difficult to untangle.

      We thank the reviewer for their positive assessment of our study and for highlighting the significant finding that intermittent fasting can restore age-related declines in bone healing. We appreciate the observation that our results suggest a complex interplay of metabolic processes rather than a single "magic bullet" mechanism. Indeed, the ability of gut microbiota modulation or NAD+ pathway targeting to replicate intermittent fasting's benefits underscores this complexity. While we recognize the challenges of disentangling these interconnected pathways, we believe our findings offer valuable insights into the multifaceted nature of intermittent fasting's impact on aged tissue repair. We hope this study serves as a foundation for future research aimed at identifying the individual contributions of these pathways and developing targeted therapeutic strategies.

      Reviewer #3 (Public review):

      Summary:

      This study aims to address the significant challenge of age-related decline in bone healing by developing a dual therapeutic strategy that rejuvenates osteogenic function in aged calvarial bone tissue. Specifically, the authors investigate the efficacy of combining local Wnt3a-mediated osteoprogenitor stimulation with systemic intermittent fasting (IF) to restore bone repair capacity in aged mice. The highlights are:

      (1) Novel Approach with Aged Models:

      This pioneering study is among the first to demonstrate the rejuvenation of osteoblasts in significantly aged animals through intermitted fasting, showcasing a new avenue for regenerative therapies.

      (2) Rejuvenation Potential in Aged Tissues:

      The findings reveal that even aged tissues retain the capacity for rejuvenation, highlighting the potential for targeted interventions to restore youthful cellular function.

      (3) Enhanced Vascular Health:

      The study also shows that vascular structure and function can be significantly improved in aged tissues, further supporting tissue regeneration and overall health.<br /> Through this innovative approach, the authors seek to overcome intrinsic cellular deficits and environmental changes within aged osteogenic compartments, ultimately achieving bone healing levels comparable to those seen in young mice.

      Strengths:

      The study is a strong example of translational research, employing robust methodologies across molecular, cellular, and tissue-level analyses. The authors leverage a clinically relevant, immunocompetent mouse model and apply advanced histological, transcriptomic, and functional assays to characterise age-related changes in bone structure and function. Major strengths include the use of single-cell RNA sequencing (scRNA-seq) to profile osteoprogenitor populations within the calvarial periosteum and suture mesenchyme, as well as quantitative assessments of mitochondrial health, vascular density, and osteogenic function. Another important point is the use of very old animals (up to 88 weeks, almost 2 years) modelling the human bone aging that usually starts >65 yo. This comprehensive approach enables the authors to identify critical age-related deficits in osteoprogenitor number, function, and microenvironment, thereby justifying the combined Wnt3a and IF intervention.

      Weaknesses:

      One limitation is the use of female subjects only and the limited exploration of immune cell involvement in bone healing. Given the known role of the immune system in tissue repair, future studies including a deeper examination of immune cell dynamics within aged osteogenic compartments could provide further insights into the mechanisms of action of IF.

      We thank the reviewer for their thorough summary and positive assessment of our study, particularly highlighting its translational nature, the robust methodologies employed, and the relevance of our aged animal model. We appreciate the insightful suggestion to include male subjects and to explore immune cell dynamics in future investigations.

      We acknowledge the limitation of using only female mice in the current study and agree that future studies incorporating both sexes and investigating immune cell contributions within aged osteogenic compartments would offer valuable insights into the mechanisms underlying intermittent fasting and its impact on bone healing.

      Our focus on female mice was informed by their distinct characteristics, including delayed healing and higher fracture risk (PMID: 37508423, PMID: 34434120). Importantly, female mice present a more challenging case for bone repair, making them a stringent test for evaluating the effectiveness of our rejuvenation approaches. Moreover, our research protocol, approved under animal license, adhered to ethical principles and the 3Rs, allowing us to reduce the number of animals required by focusing on a single sex.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should provide a justification for the use of female mice in this study. Additionally, the section on animal methods should be expanded to align with ARRIVE guidelines.

      We thank the reviewer for their valuable feedback. In response to the comment regarding the use of female mice, we have included a justification in the updated manuscript. As noted, female mice were selected for this study due to their distinct characteristics, such as delayed healing and higher fracture risk (PMID: 37508423, PMID: 34434120), which provide a more challenging model for evaluating bone repair strategies. We believe this made our study a stringent test of the efficacy of the rejuvenation approaches being investigated.

      Additionally, we have revised the animal methods section to ensure it aligns with the ARRIVE guidelines.

      (2) Intermittent fasting can influence circadian rhythms in various ways. In the RNA-seq data, do the authors observe any changes related to circadian rhythm pathways?

      The reviewer raises an important point regarding the influence of intermittent fasting (IF) on circadian rhythms. Our RNA-seq data revealed significant alterations in circadian rhythm pathways, particularly within the aged periosteal CD90+ cell population during IF. Specifically, the PAR bZip family transcription factors Dbp, Hlf, and Tef (q < 0.05) were significantly upregulated, consistent with their established roles as circadian rhythm regulators (PMID: 16814730, PMID: 31428688).

      In suture CD90+ cells from the Aged + IF group, Dbp expression was significantly elevated compared to the Aged AL control group. Moreover, several other circadian-controlled genes, including Sirt1, Kat2b, Csnk1e, Ezh2, Fbxw11, and Ucp2 (p < 0.05), were also upregulated (Fig. 4b), suggesting enrichment of Clock/Per2/Arntl transcriptional targets, essential components of the circadian clock.

      The observed upregulation of circadian rhythm effectors like Dbp, Hlf, and Tef further suggests a potential role for circadian transcription in CD90+ cell rejuvenation and bone repair in aged mice. While previous studies have primarily focused on the role of circadian rhythms in osteoblasts in vitro (PMID: 34579752, PMID: 30290183), our findings provide compelling evidence for their involvement in bone regeneration in vivo, providing compelling evidence for future investigation into this mechanism.

      Chip-SEQ studies have shown D-box sites near promoters in Wnt/β-catenin components (e.g. Lrp6, Lrp5, Wnt8a, Fzd4) in pro-osteogenic transcription factor Zbtb16 (and see Fig 5), and in 11 of the 44 mouse collagen genes (PMID: 31428688). These components are known to regulate osteogenesis, and their proximity to circadian-controlled transcription factors suggests a possible overlap between circadian regulation and Wnt signaling in promoting bone repair.  Additionally, circadian rhythmicity, stem cell function, and Wnt signaling are interlinked (PMID: 29277155, PMID: 25414671). Food intake is a powerful regulator of the circadian rhythm in several organs (PMID: 11114885, PMID: 32363197), but little is known about the diet-circadian interaction in bone repair. The possibility that circadian transcription can be harnessed to target Aged stem cell function towards bone repair is a promising prospect.

      We have incorporated this information in Figure 2 - figure supplement 3G-H, the results section as well as in the discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors refer to 'altered cellular mechanobiology', 'age-related changes in mechanobiology', etc. Here, they are using this terminology to refer to changes in F-actin intensity and nuclear shape. While I agree that these measures are indicators of a cellular response to mechanical cues, calling this 'changes in mechanobiology' doesn't sound quite correct to me. 'Mechanobiology' to me, is a field of study. Perhaps the authors should consider changing their terminology.

      We appreciate the reviewer’s insightful comment on the terminology used in our manuscript. We agree that the term "mechanobiology" is a broad field of study and using it in the context of changes in F-actin intensity and nuclear shape may be misleading. We have revised the text to better reflect the specific cellular responses to mechanical cues, such as changes in the cytoskeleton and nuclear morphology, rather than referring to them as "altered mechanobiology." The updated terminology more accurately conveys the observed cellular alterations in response to mechanical forces. We have made these adjustments throughout the manuscript for clarity and precision.

      (2) Three of the measures the authors use to highlight age-related changes (and rejuvenation) in their animal model are F-actin intensity, nuclear shape, and vascularisation. However, they never really explain what they believe these readouts mean practically/functionally. Indeed, it makes sense that less vascularisation would be associated with an aged phenotype and preclude healing, but this is only mentioned somewhat cursorily in the discussion. While vascularisation is discussed in the context of aging in the discussion, it is not discussed in the context of healing (which would seem relevant in the context of vascularisation being used as a readout in the healing models in response to Akk and IF treatment). Similarly, the changes in F-actin intensity and nuclear shape might suggest changes in the stiffness of the periosteum (as mentioned in the discussion), which could indeed be an indicator of an aged phenotype; however, their role in healing (in response to Akk and IF) are not clearly articulated.

      We appreciate the reviewer’s insightful comments and have made revisions to clarify the implications of age-related changes in vascularization, F-actin intensity, and nuclear shape, as well as the functional significance of these observations in the context of healing and rejuvenation.

      Vascularization:

      Vascularization and modulation of blood flow are critical for calvarial bone repair, as supported by multiple studies (e.g., PMID: 38032405, PMID: 21156316, PMID: 25640220). Early in the calvarial repair process, blood vessels grow independently of osteoprogenitor cells, establishing a supportive environment that promotes osteoprogenitor migration and subsequent ossification (PMID: 38834586). Furthermore, angiogenic vessels from the periosteum at defect edges contribute to creating a specialized microenvironment essential for bone healing (PMID: 38834586, PMID: 38032405). Compromised vascularization significantly impairs the healing of critical-sized calvarial defects (PMID: 29702250).

      Our data reveal a decline in periosteal vascularization with age, potentially compromising this microenvironment and impairing repair in aged animals. Importantly, our findings indicate that intermittent fasting (IF) reverses this phenotype by restoring periosteal vascularization. This rejuvenation of the vascular microenvironment aligns with improved bone repair outcomes in aged mice subjected to IF. We have revised the manuscript to emphasize the importance of vascularization in healing and to highlight the role of IF in restoring this critical aspect of the bone healing microenvironment.

      F-actin intensity and nuclear shape:

      Age-related changes in F-actin intensity and nuclear shape are associated with increased tissue stiffness, a hallmark of aging. Tissue stiffness has been shown to impair progenitor cell function and hinder repair in various systems, including neuroprogenitors (PMID: 31413369). Softening the extra cellular matrix in aged tissues has been demonstrated to partially restore progenitor function and improve repair outcomes, as seen in the case of neuroprogenitors (PMID: 31413369). In our study, IF reversed age-associated changes in F-actin expression and nuclear shape, restoring these parameters to a phenotype resembling that of younger animals. This suggests that IF mitigates the mechanical changes associated with aging, reducing tissue stiffness and rejuvenating the periosteum to facilitate improved bone healing, similar to the outcomes observed in younger models.

      Following the reviewer’s advice, we have revised the text to clearly articulate the correlations and interpretations of our data regarding tissue mechanics and bone repair. Thank you for highlighting these critical aspects.

      (3) In relation to my point 2) on nuclear shape, there are reports that aging is linked to changes in Lamin B1. Have the authors considered this? It might provide a clearer link between their data and the tissue-level phenotypes they observe.

      Thank you for your comment regarding the potential link between aging and changes in Lamin B1. Following your suggestion, we performed Lamin B1 immunostaining on samples from Young, Adult, Aged, and Aged + IF groups. However, no significant differences in Lamin B1 levels were observed across these groups. These findings indicate that changes in Lamin B1 in osteoprogenitors are not apparent during aging, suggesting that Lamin B1 alterations in the context of aging may be tissue- and cell-type-specific.

      The new data was added in Figure 1 - figure supplement 2i-j.

      (4) In the data associated with Figure 2, the authors find that in the aged mice, MMP9 expression is increased, but MMP2 expression is decreased. They associate the decrease in MMP2 expression with decreased migration, but the canonical function of MMP9 should be similar to that of MMP2. Are there tissue-specific differences in the activity of MMP2/9 that could account for this?

      Thank you for the thoughtful comment. While both MMP-2 and MMP-9 are involved in ECM remodeling and share some overlapping canonical functions, their roles are context-dependent and exhibit tissue-specific differences that could explain the observed changes in aged mice. MMP-2 has been shown to play a critical role in maintaining the structural and functional integrity of flat bones, such as those in the craniofacial skeleton, by supporting bone remodeling (PMID: 17400654, PMID: 17440987, PMID: 16959767). The decreased expression of MMP-2 in aged mice may impair these local processes, leading to reduced migratory capacity of osteoprogenitors and contributing to aging-related changes in flat bone structure and function.

      In contrast, MMP-9 is more prominently involved in long bone remodeling, particularly at the growth plate where it regulates hypertrophic chondrocyte turnover, vascularization, and ossification during endochondral bone formation (PMID: 21611966, PMID: 9590175, PMID: 23782745, PMID: 16169742 ). Additionally, MMP-2 and MMP-9 differ in their regulation of specific ECM substrates and their interactions with bone-resident cells, which may further drive divergent outcomes in distinct bone types. For example, MMP-9’s role in osteoclastogenesis and its regulation of ECM proteins like type I collagen could be more critical in long bones, while MMP-2’s involvement in fine-tuning ECM microarchitecture may hold greater importance in flat bones.

      The increased expression of MMP-9 in aged calvarial osteoprogenitors may reflect a compensatory mechanism in response to the reduced MMP-2 activity, possibly in response to increased ECM turnover demands. Further studies examining the precise molecular pathways driving these changes in osteoprogenitors will help clarify the underlying mechanisms and their contributions to age-related alterations in flat bone structure and function.

      (5) In lines 391-2, the authors conclude that the data from Figure 4 shows that "during IF, CD90 cells, despite being aged, are more capable of ECM modulation and migration". The authors certainly present evidence that this is true, but the RNAseq showed that the enriched GO terms were predominantly associated with immune responses ('response to cytokine') and the proliferation phenotype seems very strong. Therefore, I would suggest that this overarching statement regarding the findings be less focussed on this one aspect of the finding, which doesn't look to be the dominant phenotype of the cellular response. And indeed, the authors move on from here to explore a mechanism associated with metabolism, not specifically with ECM remodelling.

      We greatly appreciate the reviewer insight regarding the interpretation of our findings, particularly the conclusion drawn from Figure 4.

      In response, we have revised the conclusion to more accurately reflect these findings.

      The revised text in the conclusion now reads: " Together, these findings suggest that IF rejuvenates aged CD90+ cells, in part, by enhancing proliferation, immune response, ECM remodeling, Wnt/β-catenin pathway, and metabolism, including increased ATP levels and decreased AMPK levels.”

      We hope that this adjustment better aligns with your suggestion and provides a more accurate summary of the key findings.

      (6) Fasting blood glucose levels are often cited as an indicator of metabolic health. Did the authors look at this in their animals who underwent the IF protocol? Could this have had an impact on the healing response?

      We thank the reviewer for this insightful comment. Throughout our study, we have withdrawn blood from the animals for various analyses that were not included in this manuscript in order to maintain focus on the osteoprogenitors.

      Our analysis included the assessment of the metabolic health of the animals using fasting blood glucose levels and the area under the curve (AUC) of the intraperitoneal glucose tolerance test (IPGTT).

      Fasting blood glucose levels reflect the animals' ability to maintain stable glucose levels after fasting, while the AUC from the IPGTT measures how efficiently glucose is cleared from the bloodstream following a glucose challenge. Typically, lower fasting blood glucose levels and reduced AUC indicate improved insulin sensitivity, better glucose metabolism, and enhanced metabolic control (PMID: 18812462, PMID: 19638507).

      Our findings show that intermittent fasting (IF) significantly reduced both the fasting blood glucose levels and the AUC in the IPGTT. This indicates that IF enhances metabolic flexibility, likely through improved insulin sensitivity and better glucose homeostasis. By lowering fasting blood glucose, IF reduces the reliance on excessive gluconeogenesis during fasting, while a reduced AUC indicates more efficient postprandial glucose clearance, consistent with enhanced insulin action and reduced fluctuations in blood glucose levels. The new data has been incorporated in Figure 3 - figure supplement 1d-g.

      Methods:

      “Blood glucose level measurement

      Fasting blood glucose levels were measured (Accu-Check tests strips) from 6h fasting mice by blood sampling the tail vein. For intraperitoneal glucose tolerance test (IPGTT), glucose was injected intraperitoneally (2 g/kg), and the blood glucose levels were measured after 15, 30, 60 and 120 minutes.”

      Improved metabolic health through lower fasting glucose and reduced AUC can have profound implications for tissue repair (PMID: 32809434). Stable glucose levels ensure a consistent energy supply for key cellular processes, such as cell proliferation, migration, and differentiation, which are essential for regeneration. Enhanced insulin sensitivity supports nutrient delivery to cells and reduces inflammation, creating an environment conducive to tissue healing. Additionally, intermittent fasting's ability to optimize glucose metabolism and regulate insulin secretion may enhance the function of stem and progenitor cells, further improving the tissue repair process (PMID: 28843700). Together, these findings suggest a mechanistic link between improved metabolic health and the enhanced healing observed in animals subjected to intermittent fasting.

      (7) In Supplementary Figure 10, the authors look at bone remodelling by assessing TRAP staining, as an indicator of osteoclast activity. I'm not sure if these data add all that much to the study. The authors have looked at bone formation at a tissue level using microCT. Here, they look at bone resorption at a cellular level with the TRAP assay. Overall, this probably suggests more bone remodelling, but the TRAP assay on its own at the cellular level could also be interpreted as an osteoporosis-like phenotype. This is clearly not the case because the authors show robust bone healing by microCT. In short, as an isolated measure of osteoclast activity at the cellular level without cellular-level assays of osteoblast activity, the interpretation of these data is not that clear. The microCT speaks far more of the phenotype and is, in my opinion, sufficient to make this point.

      We thank the reviewer for their comments regarding the interpretation of the TRAP staining data and its context within the study. We appreciate the concern that, without direct assays of osteoblast activity, the TRAP assay could lead to ambiguity.

      We have shown that intermittent fasting significantly increases the number and function of osteoprogenitor cells, the precursors to osteoblasts. While we acknowledge that these data do not directly measure osteoblast numbers or activity, they strongly suggest an increased capacity for osteoblast differentiation and bone formation. This aligns with the microCT findings of robust bone structure and healing.

      After careful consideration and given that the microCT and histology findings  already provide robust and comprehensive evidence for bone structure and healing, we have decided to remove the TRAP staining data from the manuscript. We believe this change simplifies the manuscript and strengthens its focus on the most impactful data.

      (8) In the discussion, the authors make a number of links between aging and IF. However, one of the exciting conclusions of this manuscript is that IF aids in healing in aged animals. In this context, IF has not impacted the aging process itself because the animals have not experienced an IF protocol across their lifespan, but rather only after injury. In this context, perhaps the authors should also be focussing their discussion on evidence of the short-term response to IF rather than its effects on aging, which are longer-term.

      We appreciate the reviewer's comment and agree that emphasizing the short-term effects of intermittent fasting is crucial. Our study is the first to examine this protocol in Aged animals.

      To address this, we have revised the discussion and highlighted how short-term IF enhances metabolic health, promotes osteoprogenitor functionality, and supports bone remodeling, as observed in our study.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should clarify details on intermittent fasting protocols, especially regarding caloric intake differences between fasting and non-fasting days, to aid reproducibility.

      We appreciate the reviewer's suggestions and have incorporated them by clarifying the relevant details. The new data are presented in Figure 3 - figure supplement 1a-c.

      Methods:

      “Caloric intake calculation

      To assess the caloric intake of mice, the food was weighted when made available to the mice (Win), and when removed (Wout). The daily consumed food was calculated based on the weight difference (Win - Wout), then converted to kcal (1 g = 3.02 kcal, LabDiet, 5053), and expressed as kcal/mouse/day for each cage (n cage ³ 3 with 1 to 5 mice/cage).”

      (2) Did the authors evaluate the effect of their intermittent fasting protocol on fasting blood glucose levels?

      Following the reviewer comment we included two measurements: 1) Fasting blood glucose, which reflects the ability to maintain glucose homeostasis during fasting, and 2) fasting blood glucose levels and the area under the curve (AUC) of the intraperitoneal glucose tolerance test (IPGTT), which measures glucose clearance efficiency after a glucose challenge. Lower values for both typically indicate improved insulin sensitivity, glucose metabolism, and metabolic control.

      Our findings demonstrate that intermittent fasting significantly reduced both fasting blood glucose and IPGTT AUC, suggesting enhanced metabolic flexibility, likely through improved insulin sensitivity and glucose homeostasis. Lower fasting blood glucose with IF indicates reduced reliance on gluconeogenesis during fasting, while a reduced AUC suggests more efficient postprandial glucose clearance, consistent with enhanced insulin action and reduced blood glucose fluctuations. This new data is included in Figure 3 - figure supplement 1.

      Generally, the improved metabolic environment supports tissue repair by ensuring adequate energy for cell proliferation and migration, reducing inflammation, and promoting the function of stem cells involved in tissue regeneration. Thus, this outcome of intermittent fasting may create a more favorable environment for tissue repair, potentially accelerating the healing of damaged tissues and improving overall regenerative capacity.

      (3) In Figure 1E-F, the nuclei have an interesting shape and the authors quantified F-actin. Given the role of lamin B in nuclear integrity, an analysis of lamin B expression and its structural integrity in aged osteoprogenitors could provide valuable insights into cellular aging mechanisms and their potential reversal with intermittent fasting.

      In response to the reviewer's comment, we performed Lamin B1 immunostaining on samples from Young, Adult, Aged, and Aged + IF groups. We observed no significant differences in Lamin B1 levels across these groups. This suggests that age-related changes in Lamin B1 are not evident in osteoprogenitors and may be tissue- or cell-type specific. The new data was added in Figure 1 - figure supplement 2i-j.

      (4) The authors should explain, in the main text or the methods section, why are they only using females in this study.

      We appreciate the reviewer's comment regarding the use of female mice. Female mice were chosen for this study due to their delayed healing and higher fracture risk (PMID: 37508423, PMID: 34434120), presenting a more challenging model for evaluating bone repair strategies and providing a stringent test of our rejuvenation approaches. This justification has been added to the revised manuscript. The animal methods section has also been updated to comply with ARRIVE guidelines.

      (5) This story stands alone and has an incredible amount of data. However, for a follow-up study, I would like to suggest consideration of including a broader analysis of immune cell involvement within the osteogenic compartments to strengthen the mechanistic understanding of IF's impact.

      We thank the reviewer for this insightful suggestion. We agree that investigating the role of immune cells within the osteogenic compartments could provide valuable mechanistic insights into how intermittent fasting influences tissue regeneration. Immune cells are key mediators of inflammation and repair, and their interactions with osteoprogenitors and other cells in the bone healing environment likely contribute to IF's effects.

      While our study focuses on IF's impact on osteoprogenitor function and tissue repair, we acknowledge the importance of future research exploring immune cell involvement. Techniques like single-cell RNA sequencing or flow cytometry could characterize immune cell populations and their functional states within osteogenic niches, allowing for a deeper understanding of immune-skeletal interactions during IF-mediated bone healing. We appreciate the reviewer highlighting this promising avenue for future research.

      Minor corrections to the text and figures:

      (1) References formatting should be revised (eg. line 41).

      The reference formatting was corrected.

      (2) Line 144 - what do the authors mean by p2 in the references?

      Thank you for your comment, we corrected the error and removed p2 from the reference.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Nitric oxide (NO) has been implicated as a neuromodulator in the retina. Specific types of amacrine cells (ACs) produce and release NO in a light-dependent manner. NO diffuses freely through the retina and can modulate intracellular levels of cGMP, or directly modify and modulate proteins via S-nitrosylation, leading to changes in gap-junction coupling, synaptic gain, and adaptation. Although these system-wide effects have been documented, it is not well understood how the physiological function of specific neuronal types is affected by NO. This study aims to address this gap in our knowledge. 

      There are two major findings. 1) About a third of the retinal ganglion cells display cell-type specific adaptation to prolonged stimulus protocols. 2) Application of NO specifically affected Off-suppressed ganglion cells designated as G32 cells. The G32 cluster likely contains 3 ganglion cell types that are differentially affected. 

      This is the first comprehensive analysis of the functional effects of NO on ganglion cells in the retina. The cell-type specificity of the effects is surprising and provides the field with valuable new information. 

      Strengths: 

      NO was expected to produce small effects, and considerable effort was expended in validating the system to ensure that changes in the state of the preparation would not confound any effects of NO. The authors used a sequential stimulus protocol to control for changes in the sensitivity of the retina during the extended recording periods. The approach potentially increases the sensitivity of the measurements and allows more subtle effects to be observed. 

      Neural activity was measured by Ca-imaging. Responsive ganglion cells were grouped into 32 types using a clustering analysis. Initial control experiments demonstrated that the celltypes revealed by the analysis largely recapitulate those from their earlier landmark study using a similar approach. 

      Application of NO to the retina modulated responses of a single cluster of cells, labeled G32, while having little effect on the remaining 31 clusters. In separate experiments, ganglion cell spiking activity was recorded on a multi-electrode array (MEA). Together the Ca-imaging and MEA recordings provide complementary approaches and demonstrate that NO modulates the temporal but not spatial properties of affected cell-types.

      Weaknesses: 

      The concentration of NO used in these experiments was ~0.25µM, which is 5- to 10-fold lower than the endogenous concentration previously measured in rodent retina. It is perhaps surprising that this relatively low NO concentration produced significant effects. However, the endogenous measurements were done in an eye-cup preparation, while the current experiments were performed in a bare (no choroid) preparation. Perhaps the resting NO level is lower in this preparation. It is also possible that the low concentration of NO promoted more selective effects.

      Reviewer #2 (Public review): 

      Neuromodulators are important for circuit function, but their roles in the retinal circuitry are poorly understood. This study by Gonschorek and colleagues aims to determine the modulatory effect of nitric oxide on the response properties of retinal ganglion cells. The authors used two photon calcium imaging and multi-electrode arrays to classify and compare cell responses before and after applying a NO donor DETA-NO. The authors found that DETA-NO selectively increases activity in a subset of contrast-suppressed RGC types. In addition, the authors found cell-type specific changes in light response in the absence of pharmacological manipulation in their calcium imaging paradigm. This study focuses on an important question and the results are interesting. The limitations of the method and data interpretation are adequately discussed in the revised manuscript. 

      The authors have addressed my previous comments, included additional discussions on the limitations of the method, and provided a more careful interpretation of their data. 

      Recommendations for the authors: 

      Please correct the citation that reviewer #1 mentioned. In addition, a little more discussion of the NO concentration issue would be helpful. The low NO concentration is not a weakness in the data; it simply raises questions regarding the interpretation.

      Thank you for these recommendations.

      Regarding the citation error, we are not sure if Reviewer #1 refers to a citation   formatting error or incorrect placement. In any case, we modified the text: We  specified the extracted information regarding the NO concentrations and put the  applied concentration into that context (Lines 621-635). In addition, we made clear  that the citation of Guthrie (2014) refers to the dissertation, which can be easily  retrieved via Google Scholar. We also cited the mentioned ARVO abstract by   Guthrie and Mieler (2014). 

      We hope that these modifications solve the above-mentioned issues. 


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary: 

      Nitric oxide (NO) has been implicated as a neuromodulator in the retina. Specific types of amacrine cells (ACs) produce and release NO in a light-dependent manner. NO diffuses freely through the retina and can modulate intracellular levels of cGMP, or directly modify and modulate proteins via S-nitrosylation, leading to changes in gap-junction coupling, synaptic gain, and adaptation. Although these system-wide effects have been documented, it is not well understood how the physiological function of specific neuronal types is affected by NO. This study aims to address this gap in our knowledge. 

      Strengths: 

      NO was expected to produce small effects, and considerable effort was expended in validating the system to ensure that any effects of NO would not be confounded by changes in the state of the preparation. The authors used a paired stimulus protocol to control for changes in the sensitivity of the retina during the extended recording periods. The approach potentially increases the sensitivity of the measurements and allows more subtle effects to be observed. 

      Neural activity was initially measured by Ca-imaging. Responsive ganglion cells were grouped into 32 types using a clustering analysis. Initial control experiments demonstrated that the cell-types revealed here largely recapitulate those from their earlier landmark study using the same approach (Fig. 2). 

      Application of NO to the retina strongly modulated responses of a single cluster of cells, labeled G32, while having little effect on the remaining 31 clusters. This result is evident in Fig. 3e. 

      Separate experiments measured ganglion cell spiking activity on a multi-electrode array (MEA). Clustering analysis of the peri-stimulus spike-time histograms (PSTHs) obtained from the MEA data also revealed 32 clusters. The PSTHs for each cluster were aligned to the Ca-imaging data using a convolution approach. The higher temporal resolution of the MEA recordings indicated that NO increased the speed of sub-cluster 2 responses but had no effect on receptive field size. The physiological significance of the small change in kinetics remains unclear. 

      We thank the reviewer for their detailed and constructive comments.

      Weaknesses: 

      The G32 cluster was further divided into three sub-types using Bayesian Information Criterion (BIC) based on the temporal properties of the Ca-responses. This sub-clustering result seems questionable due to the small difference in the BIC parameter between 2 and 3 clusters. Three sub-clusters of the G32 cluster were also revealed for the PSTH data, however, the BIC analysis was not applied to further validate this result. 

      (1.1) We agree with the reviewer that this is an important point to be clarified. To this end, we repeated the analysis with n=2 clusters (see Author response image 1 below). In brief, we found that the overall interpretation did not change: Both clusters in the Ctrl1-dataset showed barely any type-specific adaptational effects, whereas under NO application, temporal contrast responses decreased (see Author response image 1 below). If requested, we would be happy to add this image to the supplementary material. 

      Author response image 1.

      In an additional analysis, we evaluated if n=2 or n=3 was the “better” choice for the number of clusters. In the new Supplementary Fig. S4, we compared the clusters with n=2 (top) and n=3 (bottom). For n=2, the two clusters are relatively strongly correlated for both visual stimuli, whereas for n=3, the clusters become more distinct, especially with respect to differences in the correlations for the two stimuli (Fig. S4b). For n=2, the low intra-cluster correlation (ICC) strongly suggests that cluster 2 contains multiple response types (ICC(C2) = 0.5 ± 0.48, mean ± s.d.; Fig. S4c). For n=3, the mean ICC values are high for all three clusters (ICC(C1) = 0.81 ± 0.16; ICC(C2) = 0.86 ± 0.07; ICC(C3) = 0.83 ± 0.1; mean ± s.d.). Together, this suggests that n=3 clusters captures the response diversity in G32 better than n=2 clusters. 

      Finally, we performed a BIC analysis for the MEA dataset and found the optimal number of clusters to be also n=3 (see new Suppl. Fig. S5).

      The alignment of sub-clusters 1, 2, and 3 identified in the Ca-imaging and the MEA recordings seemed questionable, because the temporal properties of clusters did not align well, nor did the effects of NO. 

      (1.2) To address this important point, we analyzed the correlations between the control responses of the three clusters from the Ca<sup>2+</sup>-dataset with the ones from the MEA-dataset (see new Suppl. Fig. S7). To avoid confusion, we named the clusters in the MEA-dataset i,ii,iii (see Fig. 8). We found two of the three clusters to be highly correlated (Ca<sup>2+</sup> clusters 2,3 and MEA clusters iii, ii), whereas one cluster was much less so (cluster 1 vs. cluster i), likely due to differences in response kinetics. In clusters i and ii NO application led to a release of suppression for temporal contrasts – similar to what we observed in the Ca<sup>2+</sup> data (see also our new analysis of the MEA data in Suppl. Fig. S6, as discussed further below).

      We agree that the cell types underlying the Ca<sup>2+</sup> and MEA G32 clusters may not be the same – aligning functional types between those two methods is challenging due to several factors, mainly because while Ca<sup>2+</sup> is a proxy for spiking activity, other Ca<sup>2+</sup> sources as well as sub-threshold membrane potential changes affect the intracellular Ca<sup>2+</sup>, potentially in a cell type-specific way. We explain this now better in the text.

      In any case, our main point was not to unambiguously align the cell types but to show that in both datasets, we find three subclusters of G<sub>32</sub>, which are affected by NO in a differential manner, particularly their suppression to temporal contrasts.

      The title of the paper indicates that nitric oxide modulates contrast suppression in a subset of mouse retinal ganglion cells, however, this result appears to be inferred from previous results showing that G32 is identified as a "suppressed-by-contrast" cell. The present study does not explicitly evaluate the amount of contrast-suppression in G32 cells. 

      (1.3) The reviewer is correct in that we did not quantify contrast-suppression in G<sub>32</sub> in detail but focused on the responses to temporal contrast (chirp and moving bar) and its modulation by NO (Fig. 5). In this context, please note that G<sub>32</sub>’s responses to the moving bar stimulus suggests that the cells are also suppressed by spatial contrast (i.e., an edge appearing in their RF). The functional RGC type G<sub>32</sub> (“Off suppressed 2”) was defined in an earlier study (Baden et al. 2016); it was assigned to the “Suppressed-by-Contrast” (SbC) category mainly because temporal contrast suppresses its responses. Already then, coverage analysis indicated that G<sub>32</sub> may indeed contain several RGC types – in line with our clustering analysis. It is still unclear if G<sub>32</sub> contains one (or more) of the SbC cells described by Jacoby & Schwartz (2018); in their recent study, Wienbar and Schwarz (2022) introduced the novel bursty-SbC RGC, which Goetz et al. (2022) speculated to potentially align with G<sub>32</sub>.<br /> We now discuss the relationship between G<sub>32</sub> and the SbC RGCs defined in other studies in the revised manuscript.

      In its current form, the work is likely to have limited impact, since the morphological and functional properties of the affected sub-cluster remain unknown. The finding that there can be cell-specific adaptation effects during experiments on in vitro retina is important new information for the field.

      (1.4) Again, we thank the reviewer for the detailed and helpful feedback. We hope that the reviewer finds our revised manuscript improved.

      Reviewer #1 (Recommendations For The Authors):  

      Most of the calcium activity traces (dF/F) throughout the paper have neither vertical nor horizontal calibration bars. Presumably, most values are positive, but this is unclear as a zero level is not indicated anywhere. Without knowing where zero dF/F is, it is not possible to determine whether the NO increased the Ca-signal or blocked a decrease in the Ca-signal. 

      Both ∆F/F and z-scoring, as we used here, are ways to normalize Ca<sup>2+</sup> traces. We decided against using ∆F/F<sub>0</sub> because this typically assumes that F represents the cell’s Ca<sup>2+</sup> resting level (F<sub>0</sub>; without activity). However, in our measurements, the “resting” Ca<sup>2+</sup> levels (i.e. before presenting a stimulus) may indeed reflect no spiking activity (e.g., in an ON RGC) but may also reflect baseline spiking activity (e.g., in an G<sub>32</sub>, which has a baseline firing rate of ~10 Hz; see Fig. S6). Hence, we used z-scoring, which carries no assumption of resting Ca<sup>2+</sup> level equal to no activity. In practice, we normalized all traces to the Ca<sup>2+</sup> level prior to the light stimulus and defined this as zero (as described in the Methods).

      We considered the reviewer’s suggestion of adding zero lines to every trace but felt that this would hamper the overall readability of the figures.

      Regarding calibration bars: We made sure that horizontal bars (indicating time) are present in all figures. We decided to leave out vertical bars in Ca<sup>2+</sup> responses, because as explained above, the traces are normalized (and unit-free), and within a figure all traces are scaled the same.

      Points of clarification for the Methods: 

      (1) The stimulus field was 800 x 600 µm. Presumably, both scan fields were contained within this region when scanning either Field 1 or Field 2 so that the adaptation level of the preparation at both locations was maintained? 

      Yes, the stimulation field is always kept centered on the respective recording (scan) field and the adaptation level for each recording field was maintained.

      (2) There appeared to be an indeterminate amount of time between the initial 10-minute adaptation period and Ctrl1, whereas there were no such gaps between subsequent scans. Is this likely to produce differences in adaptation state and thus represent a systematic error? 

      At this time point, recording (scan) fields were selected to make sure that the cells in the field were uniformly labelled with the Ca<sup>2+</sup> indicator and responsive to light stimuli. This typically happened already at the end of the light adaptation phase and/or right after. When selecting the fields, light stimuli were presented (to test responsiveness) and thereby the adaptation level was maintained independent of the duration of this procedure, minimizing systematic errors.

      (3) Was the dense white noise stimulus applied during the wash-in period to maintain the adaptation state of the preparation prior to the subsequent scan? 

      The dense noise was not applied throughout the wash-in period but at least 5-10min before the field was recorded with a drug (e.g., NO). 

      Fig. 1d illustrates very nicely how the stimuli align with the responses. It would have been helpful to have this format continue throughout the paper but unfortunately, the vertical lines are dropped in Fig. 2a and then the stimulus waveform is omitted in Fig. 2e onwards. 

      Thanks, good idea. We added the vertical lines and the stimulus waveform to the figures where they were missing to improve the readability. 

      What was the rationale for selecting the concentration of the NO donor used? Is it likely to mimic natural levels? 

      A DETA/NO concentration of 100 µM is commonly used in studies investigating NOinduced effects. DETA/NO has a half-life time (t<sub>0.5</sub>) of 20 hours, which makes it more suitable for application in tissues (like our whole-mount preparation), because the donor can penetrate into the issue before releasing NO. In turn, this long t0.5 means that only a fraction of the bound NO is released per time unit.

      Based on t<sub>0.5</sub> for DETA/NO and NO, one can roughly estimate the NO range as follows: t<sub>0.5</sub> of NO strongly depends on the tissue and is estimated in the second to minute range (Beckman & Koppenol, 1996). Assuming a t<sub>0.5</sub> for NO of 2 minutes, a freshly prepared 100 µM DETA/NO solution is expected to result within the first hour a NO concentration of approx. 0.25 µM (taking into account that 1 mole of DETA/NO releases 1.5 moles of NO molecules; see Ramamurthi & Lewis 1997).

      In general, it is difficult to determine the physiological concentration of NO in the retina. Different measurements point at peaks of a few 100 nM (e.g., frog retina, ganglion cells: 0.25 µM, Kalamkarov et al. 2016; rodent inner retina, 0.1 to 0.4 µM, Micah et al. 2014). Hence, the NO concentrations we apply should be within the measured physiological range.

      Fig. 3e: what are the diamond symbols? If these are the individual cells, it might be better to plot them on top of the box plots so all are visible. 

      Indeed, the diamond symbols represent individual cells, yet outliers only. We decided not to plot all cells as a dot plot on top of the box plots since the readability will suffer as there are too many individual dots to show, e.g., n=251 for G<sub>32</sub> Ctrl and n=135 for G<sub>32</sub> DETA/NO.

      Fig. 3: please explain more clearly the x-axis units in a-d and the y-axis units in e. 

      To estimate potential response differences between the first and the second scan (i.e. either Ctrl 2 or NO), the traces were subtracted cell-pairwise (∆ Ctrl: Ctrl 2 – Ctrl 1; ∆ DETA/NO: NO – Ctrl 1). As all Ca<sup>2+</sup> traces were normalized, they are unit-free. Therefore, the x-axes in Fig. 3a-d represent the mean differences of each cell per cell type, e.g., a value of zero would mean that the traces of Ctrl 1 and Ctrl 2 for a cell are identical. The y-axis in Fig. 3e is also unit-free, because technically, it is the same measure as Fig. 3a-d. But since it summarizes the control- and NO-data, we refer to this as “delta mean trace.” We tried to make this clearer in the revised manuscript and a detailed description can be found in the Methods.

      Fig. 3: "...a substantial number of RGC types (34%) changed their responses to chirp and/or moving bar stimuli in the absence of any pharmacological perturbation in a highly reproducible manner...". How many of the cell types showed a significant difference? Two cell-types with p<0.001are highlighted with 3 asterisks. It would be helpful to indicate on this plot which of the other cells showed significant differences. 

      Yes, this is a good idea. Thank you. We tried to add this information to the figure, but it became rather crowded. Therefore, we added a new Suppl. Fig. S3 (same style as Fig. 3) where we exclusively summarized the control-dataset. 

      Fig. 7: To illustrate the transform from PSTH to Ca-imaging, why not use G32 data as an example?

      Fair point. We modified the figure and added G<sub>32</sub> as an example.

      It would be clearer if the cells were labeled consistently throughout the paper using their Baden cluster numbers rather than switching to the older nomenclature (JAM-B, local edge, alpha, etc), e.g. Fig. 7a,b. 

      In the revised manuscript, we now changed the nomenclature to the Ca2+ Baden et al. (2016) terminology. We used the alternative cell type names here because where Fig. 7a is discussed in the manuscript, the cell type matching did not happen yet. But we agree that a consistent nomenclature is helpful.

      The evidence supporting the sub-clustering of the G32 cells for the two recording methods could have been stronger. In Fig. 5, the BIC difference between 2 and 3 clusters is rather small. Is this result robust enough to justify 3 rather than 2 clusters? The BIC analysis should also be performed on the PSTH data-set to support the notion that the MEA G32 cluster also contains 3 rather than 2 sub-clusters. 

      Regarding the sub-clustering of G<sub>32</sub> into n=2 or n=3 clusters for both datasets, please see our detailed reply #1.1 in our response to the public comments above.

      The alignment of the three sub-clusters across the Ca-imaging and MEA data looked questionable. For example, the cluster 2 and cluster 3 traces in Fig. 5e,f look similar, with cluster 1 being more different. In Fig. 8c on the other hand, cluster 1 and 3 look similar with cluster 2 being more different. The pharmacological results also did not align well. For the Ca-imaging, NO appeared to have a large effect on cluster 1, a more modest effect on cluster 2 and less effect on cluster 3 (Fig. 5f). In comparison, the MEA results diverged, with NO producing the largest effect on cluster 2 and very modest if any effects on clusters 1 and 3 (Fig. 8c). Moreover, the temporal properties of cluster 1 and cluster 3 look very different between the Ca-imaging and MEA data. Without further comment, these differences raise concerns about the reliability of the clustering and the validity of comparisons made across the two sets of experiments. 

      We agree that this is a critical point. Please see our reply #1.2 in our response to the public comments above.

      Fig. 8: Transforming the PSTHs into Ca-traces is important to align the MEA recordings with the Ca-imaging data. It would also be very informative to see a more detailed overall presentation of the PSTH data since it provides a much higher temporal resolution of the responses. For example, illustrating the average PSTHs for the G32 cells under all the experimental conditions could be quite illuminating. 

      To address this point, we added a new Supplementary Fig. S6, which shows the pseudo-Ca<sup>2+</sup> traces for each cluster and condition next to the PSTHs. In addition, we quantified the cumulative firing rate for response features (time windows) where temporal suppression was observed in the Ca<sup>2+</sup> data. This new analysis shows that during NO-application, we can see an increase in firing rate in all clusters. Nevertheless, the effect of NO on the PSTHs is admittedly small and it is better visible in the pseudo-Ca<sup>2+</sup> transformed traces. One possible explanation for this difference may be that the overall firing rates are quite dynamic in G<sub>32</sub> such that a significant increase in “suppression” phases relative to the peak firing appears small.

      Reviewer #2 (Public Review):  

      Neuromodulators are important for circuit function, but their roles in the retinal circuitry are poorly understood. This study by Gonschorek and colleagues aims to determine the modulatory effect of nitric oxide on the response properties of retinal ganglion cells. The authors used two photon calcium imaging and multi-electrode arrays to classify and compare cell responses before and after applying a NO donor DETA-NO. The authors found that DETA-NO selectively increases activity in a subset of contrast-suppressed RGC types.

      In addition, the authors found cell-type specific changes in light response in the absence of pharmacological manipulation in their calcium imaging paradigm. While this study focuses on an important question and the results are interesting, the following issues need further clarification for better interpretation of the data. 

      We thank the reviewer for her/his detailed and constructive comments.

      (1) Design of the calcium imaging experiments: the control-control pair has a different time course from the control-drug pair (Fig 1e). First, the control-control pair has a 10 minute interval while the control-drug pair has a 25 minute interval. Second, Control 1 Field 2 was imaged 10 min later than Control 1 Field 1 since the start of the calcium imaging paradigm. 

      Given that the control dataset is used to control for time-dependent adaptational changes throughout the experiment, I wonder why the authors did not use the same absolute starting time of imaging and the same interval between the first and second round of imaging for both the control-control and the control-drug pairs. This can be readily done in one of the two ways: 1. In a set of experiment, add DETA/NO between "Control 1 Field 1 and "Control 2 Field 1" in Fig. 1e as the drug group; or 2. Omit DETA/NO in the Fig. 1e protocol as the control group to monitor the time course of adaptational changes. 

      Thank you for raising this point. We hope that in the following we can clarify the reasoning behind our protocol and the analysis approach.

      (2.1) Initially, we performed these experiments in different ways (also in the sequence suggested by the reviewer), before homing in on the paradigm illustrated in Fig. 1. We chose this paradigm for two reasons: First, we wanted to have for each retina both Ctrl1/Ctrl2 and Ctr1/NO data sets, to be sure that the time-dependent (adaptational) effects were not related to the general condition of an individual retina preparation. Second, we did not see obvious differences in time-dependent or NO-induced effects between paradigms. Therefore, while we cannot exclude that the absolute time between recordings can affect the observed changes, we do not think that such effects are substantial enough to change our conclusions.

      In the revised manuscript, we now explicitly point at the different intervals. 

      Related to the concern above, to determine NO-specific effect, the authors used the criterion that "the response changes observed for control (ΔR(Ctrl2−Ctrl1)) and NO (ΔR(NO−Ctrl1)) were significantly different". This criterion assumes that without DETA-NO, imaging data obtained at the time points of "Control 1 Field 2" and "DETA/NO Field 2" would give the same value of ΔR as ΔR(Ctrl2−Ctrl1) for all RGC types. It is not obvious to me why this should be the case, because of the unknown time-dependent trajectory of the adaptational change for each RGC type. For example, a RGC type could show stable response in the first 30 min and then change significantly in the following 30 min. DETA/NO may counteract this adaptational change, leading to the same ΔR as the control condition (false negative). Alternatively, DETA/NO may have no effect, but the nonlinear timedependent response drift can give false positive results. 

      (2.2) Initially, we assumed that after adapting the retina to a certain light level, RGCs exhibit stable responses over time, such that when adding a pharmacological agent, we can identify drug-induced response changes (e.g., by calculating the response difference). To our surprise, we found that for some RGC types the responses changed between the first and the second recording (referred to as cell type-specific adaptational effects), which is why we devised the Ctrl1/Ctrl2 vs. Ctr2/NO analysis. 

      The reviewer is correct in that we assume in our analysis that the adaptational- and NO-induced effects are independent and sum linearly. Further, we agree with the reviewer that there may be other possibilities, two of which are highlighted by the reviewer:

      (a) Interaction: for instance, if NO compensates for the adaptational effect, we would not be able to measure this; or, if this compensation was partial, underestimate both effects. 

      (b) More complex time-dependency: for example, if an RGC shows a pronounced adaptational effect with a longer delay (i.e. only after the second scan), or that a very transient NO effect has already disappeared when we perform the second scan. On the one hand, as we only can take snapshots of the RGC responses, we cannot exclude these possibilities. On the other hand, both effects (adaptational- and NO-dependent) were type-specific and reproducible between experiments (also with varying timing, see reply #2.1), which makes complex time dependencies less likely.

      The revised manuscript now reflects these limitations of our recording paradigm and points out which effects can be detected, and which likely not.

      I also wonder why washing-out, a standard protocol for pharmacological experiments, was not done for the calcium protocol since it was done in the MEA experiments. A reversible effect by washing in and out DETA/NO in the calcium protocol would provide a much stronger support that the observed NO modulation is due to NO and not to other adaptive changes. 

      (2.3) We agree that a clear wash-out would strengthen our findings. Indeed, in the beginning of our experiments, we tried to wash-out the agent in the Ca<sup>2+</sup> recordings, as we did in the MEA recordings. We soon stopped doing this in the Ca<sup>2+</sup> experiments, because response quality decreased for the third scan of the same field, likely due to bleaching of fluorescent indicator and photopigment. This is why we typically restrict the total recording time of the same field of RGCs to about 30 min (~ two scans with all light stimuli). Moreover, our MEA data showed that DETA/NO can largely be washed-out, which supports that we observed NO-specific effects. Therefore, we decided against further attempts to establish the wash-out also in the Ca<sup>2+</sup> experiments (e.g., shortening the recording time by presenting fewer light stimuli).

      (2) Effects of Strychnine: In lines 215-219, " In the light-adapted retina, On-cone BCs boost light-Off responses in Off-cone BCs through cross-over inhibition (83, 84) and hence, strychnine affects Off-response components in RGCs - in line with our observations (Fig. S2)" However, Fig. S2 doesn't seem to show a difference in the Off-response components. Rather, the On response is enhanced with strychnine. In addition, suppressed-by-contrast cells are known to receive glycinergic inhibition from VGluT3 amacrine cells (Tien et al., 2016). However, the G32 cluster in Fig. S2 doesn't seem to show a change with strychnine. More explanation on these discrepancies will be helpful.

      (2.4) We thank the reviewer for this comment. Regarding the first part, we agree that the figure does not support differences in the Off-response components. We therefore rephrased the corresponding text accordingly. Additionally, we now show all RGC types with n>3 cells per recording condition in the revised Suppl. Fig. S2 and added statistics.

      Regarding the second part, there are several possible explanations for these discrepancies:

      (a) The SbC (transient Off SbC) studied in Tien et al. (2016) likely corresponds to the RGC type G<sub>28</sub> (see Höfling et al. 2024). As mentioned above (see reply #1.2), it is unclear if G<sub>32</sub> corresponds to a previously described SbC, and if so, to which. Goetz et al. (2022) proposed that G<sub>32</sub> may align with the bursty-SbC (bSbC) type (their Supplemental Table 3), as described also by Wienbar and Schwartz (2022). An important feature of the bSbC type is that its contrast response function is mainly driven by intrinsic properties rather than synaptic input. If G<sub>32</sub> indeed included the bSbC, this may explain why strychnine does not interfere with the suppression of temporal contrast.

      (b) In Tien et al. (2016), the authors genetically removed the VG3-ACs (see their Fig. 3) and show that this ablation reduces the inhibition of tSbC cells in a stimulus size-dependent manner. Specifically, larger light stimuli (600 µm) only show marginal effects on the IPSCs and inhibitory synaptic conductance (see their Figs. 3c,d and 3e,f, respectively). In our study, the full-field chirp had a size of 800 x 600 µm. Therefore – and assuming that G<sub>32</sub> indeed included tSbCs – our observation that strychnine did not affect temporal suppression in the full-field chirp responses would be in line with Tien et al. (2016).   

      (3) This study uses DETA-NO as an NO donor for enhancing NO release. However, a previous study by Thompson et al., Br J Pharmacol. 2009 reported that DETA-NO can rapidly and reversible induce a cation current independent of NO release at the 100 uM used in the current study, which could potentially cause the observed effect in G32 cluster such as reduced contrast suppression and increased activity. This potential caveat should at least be discussed, and ideally excluded by showing the absence of DETA-NO effects in nNOS knockout mice, and/or by using another pharmacological reagent such as the NO donor SNAP or the nNOS inhibitor l-NAME. 

      Thank you for pointing out this potential caveat. We certainly cannot exclude such side effects. However, we think that this explanation of our observations is unlikely, because Thompson et al. barely see effects at 100 µM DETA/NO; in fact, their data suggests that clear NO-independent effects on the cation-selective channel occur at much higher DETA/NO concentrations, such as 3 mM. 

      In any case, in the revised manuscript, we refer to this paper in the Discussion

      (4) Clarification of methods: In the Methods, lines 1119-1127, the authors describe the detrending, baseline subtraction, and averaging. Then, line 1129, " the mean activity r(t) was computed and then traces were normalized such that: max t(|r(t)|) = 1. How is the normalization done? Is it over the entire recording (control and wash in) for each ROI? Or is it normalized based on the mean trace under each imaging session (i.e. twice for each imaging field)? 

      The normalization (z-scoring) was done for each ROI individually per stimulus and condition (Ctrl 1, Ctrl 2, DETA/NO). We normalized the traces, because the absolute Ca<sup>2+</sup> signal depends on factors, such as “resting” state of the cell (e.g., silent vs. baseline spiking activity in the absence of a light stimulus) and its fluorescent dye concentration. This also means that absolute response amplitudes are difficult to interpret. Hence, we focused on analyzing relative changes per ROI and condition, which still allowed us to investigate adaptational and drug-induced effects. In the revised manuscript, we changed the corresponding paragraph for clarification.

      As for the clustering of RGC types, I assume that each ROI's cluster identity remains unchanged through the comparison. If so, it may be helpful to emphasize this in the text.

      Yes, this is correct. We identified G<sub>32</sub> RGCs based on their Ctrl1 responses and then compares these responses with those for Ctrl2 or NO. We now clarified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript would benefit from a discussion of how the findings in this study relate to known mechanisms of NO modulation and previously reported effects of NO manipulations on RGC activity. 

      Thank you for the recommendation. We already refer to known mechanisms of NO within the retina in the Introduction. In the revised manuscript, we now added information to the Discussion.

      In the abstract, "a paired-recording paradigm" could be misleading because paired recording generally refers to the simultaneous recording of two neurons. However, the paradigm in this study is essentially imaging experiments done at two time points. 

      We agree with the reviewer. To avoid any confusion with paired electrophysiological recordings, we changed the term “paired-recording paradigm” to “sequential recording paradigm” and replaced the term “pair-/ed” with “sequentially recorded”.

    1. Author response:

      Reviewer 1:

      Summary:

      This paper describes molecular dynamics simulations (MDS) of the dynamics of two T-cell receptors (TCRs) bound to the same major histocompatibility complex molecule loaded with the same peptide (pMHC). The two TCRs (A6 and B7) bind to the pMHC with similar affinity and kinetics, but employ different residue contacts. The main purpose of the study is to quantify via MDS the differences in the inter- and intra-molecular motions of these complexes, with a specific focus on what the authors describe as catch-bond behavior between the TCRs and pMHC, which could explain how T-cells can discriminate between different peptides in the presence of weak separating force.

      Strengths:

      The authors present extensive simulation data that indicates that, in both complexes, the number of high-occupancy interdomain contacts initially increases with applied load, which is generally consistent with the authors’ conclusion that both complexes exhibit catch-bond behavior, although to different extents. In this way, the paper somewhat expands our understanding of peptide discrimination by T-cells.

      The reviewer makes thoughtful assessments of our manuscript. While our manuscript is meant to be a “short” contribution, our significant new finding is that even for TCRs targeting the same pMHC, having similar structures, and leading to similar functional outcomes in conventional assays, their response to applied load can be different. This supports out recent experimental work where TCRs targeting the same pMHC differed in their catch bond characteristics, and importantly, in their response to limiting copy numbers of pMHCs on the antigen-presenting cell (Akitsu et al., Sci. Adv., 2024; cited in our manuscript). Our present manuscript provides the physical basis where two similar TCRs respond to applied load differently. In the revised manuscript, we will make this point clearer.

      Weaknesses:

      While generally well supported by data, the conclusions would nevertheless benefit from a more concise presentation of information in the figures, as well as from suggesting experimentally testable predictions.

      Following the reviewers’ suggestions, we will update figures and use Figure Supplements to make the main figures more concise and to simplify the overall presentation.

      Regarding testable predictions, one prediction would be that B7 TCR will exhibit weaker catch bond behavior than A6. This is an important prediction because the two TCRs targeting the same pMHC have similar structures and are functionally similar in conventional assays. This prediction can be tested by single-molecule optical tweezers experiments. We also predict the A6 TCR may perform better when the number of pMHC molecules presented are limited, analogous to our recent experiments on different TCRs, Akitsu et al., Sci. Adv. (2024).

      Another testable prediction for the conservation of the basic allostery mechanism is to test the Cβ FG-loop deletion mutant located at the hinge region of the β chain, yet its deletion severely impairs the catch bond formation. These predictions will be mentioned and discussed in the updated manuscript.

      Reviewer 2:

      In this work, Chang-Gonzalez and coworkers follow up on an earlier study on the force-dependence of peptide recognition by a T-cell receptor using all-atom molecular dynamics simulations. In this study, they compare the results of pulling on a TCR-pMHC complex between two different TCRs with the same peptide. A goal of the paper is to determine whether the newly studied B7 TCR has the same load-dependent behavior mechanism shown in the earlier study for A6 TCR. The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      This is a detailed study, and establishing the difference between these two systems with and without applied force may establish them as a good reference setup for others who want to study mechanobiological processes if the data were made available, and could give additional molecular details for T-Cell-specialists. As written, the paper contains an overwhelming amount of details and it is difficult (for me) to ascertain which parts to focus on and which results point to the overall take-away messages they wish to convey.

      As mentioned above and as the reviewer correctly pointed out, the condensed appearance of this manuscript arose largely because we intended it to be a Research Advances article as a short follow up study of our previous paper on A6 TCR published in eLife. Most of the analysis scripts for the A6 TCR study are already available on Github. We will additionally deposit sample structures and simulation scripts for the B7 TCR. Trajectory will be provided upon request given their large size.

      Regarding the focus issue, it is in part due to the complex nature of the problem, which required simulations under different conditions and multi-faceted analyses. Concisely presenting the complex analyses also has been a challenge in our previous papers on TCR simulations (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024 – both are cited in our manuscript). With updated figures and texts, we expect that the presentation will be a lot clearer. But even in the present form, the reviewer points out the main take-away message well: “The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      Detailed comments:

      (1) In Table 1 - are the values of the extension column the deviation from the average length at zero force (that is what I would term extension) or is it the distance between anchor points (which is what I would assume based on the large values. If the latter, I suggest changing the heading, and then also reporting the average extension with an asterisk indicating no extensional restraints were applied for B7-0, or just listing 0 load in the load column. Standard deviation in this value can also be reported. If it is an extension as I would define it, then I think B7-0 should indicate extension = 0+/- something.

      The distance between anchor points could also be labeled in Figure 1A.

      “Extension” is the distance between anchor points (blue spheres at the ends of the added strands in Fig. 1A). While its meaning should be clear in the section “Laddered extensions” in MD simulation protocol, at first glance it may lead to confusion. In a strict sense, use of “extension” for the distance is a misnomer, but we have used it in our previous two papers (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024), so we prefer to keep it for consistency. Instead, in the caption of Table 1, we will explain its meaning, and also explicitly label it in Fig. 1A, as the reviewer suggested.

      Please also note that the no-load case B7<sup>0</sup> does not have a particular extension that yields zero load on average. It would in fact be very difficult to find such an extension (distance between two anchor points). To simulate the system without load, we separately built a TCR-pMHC complex without added linkers, and held the distal part of pMHC with weak harmonic restraints (explained in sections “Structure preparation” and “Systems without load”). In this way, no external force is applied to TCR as it moves relative to pMHC. We will clarify this when introducing B7<sup>0</sup> in the Results section.

      (2) As in the previous paper, the authors apply ”constant force” by scanning to find a particular bond distance at which a desired force is selected, rather than simply applying a constant force. I find this approach less desirable unless there is experimental evidence suggesting the pMHC and TCR were forced to be a particular distance apart when forces are applied. It is relatively trivial to apply constant forces, so in general, I would suggest this would have been a reasonable comparison. Line 243-245 speculates that there is a difference in catch bonding behavior that could be inferred because lower force occurs at larger extensions, but I do not believe this hypothesis can be fully justified and could be due to other differences in the complex.

      There is indeed experimental evidence that the TCR-pMHC complex operates under constant separation. The spacing between a T-cell and an antigen-presenting cell is maintained by adhesion molecules such as the CD2CD58 pair, as explained in our paper on the A6 TCR, (Chang-Gonzalez et al., eLife, 2024; please see the bottom paragraph on page 4 of the paper). In in vitro single-molecule experiments, pulling to a fixed separation and holding is also commonly done. Detailed comparison between constant extension vs. constant force simulations is definitely a subject of our future study. We will clarify these points when explaining about the constant extension (or separation).

      Regarding line 243–245, we agree with the reviewer that without further tests, lower forces at larger extensions per se cannot be an indicator that B7 forms a weaker catch bond. But with additional insight, it does have an indirect relevance. In addition to fewer TCR-pMHC contacts (Fig. 1C of our manuscript), the intra-TCR contacts are also reduced compared to those of A6 (Fig. 1D vs. Chang-Gonzalez et al., eLife, 2024, Fig. 8A,B, first column; reproduced in the figure in our response to reviewer 3 below). This shows that the B7 TCR forms a looser complex with pMHC compared to A6. With its higher compliance, the B7 TCR-pMHC complex needs to be under a greater extension than A6 to apply comparable levels of force, and it would be more difficult to achieve load-induced stabilization of the TCR-pMHC interface, hence a weaker catch bond. We will add this point when explaining the weaker catch bond behavior of B7.

      (3) On a related note, the authors do not refer to or consider other works using MD to study force-stabilized interactions (e.g. for catch bonding systems), e.g. these cases where constant force is applied and enhanced sampling techniques are used to assess the impact of that applied force: https://www.cell.com/biophysj/fulltext/S0006-3495(23)00341-7, https://www.biorxiv.org/content/10.1101/2024.10.10.617580v1. I was also surprised not to see this paper on catch bonding in pMHC-TCR referred to, which also includes some MD simulations: https://www.nature.com/articles/s41467-023-38267-1

      We thank the reviewer for bringing the three papers to our attention, which are:

      (1) Languin-Cattoën, Sterpone, and Stirnemann, Biophys. J. 122:2744 (2023): About bacterial adhesion protein FimH.

      (2) Peña Ccoa, et al., bioRxiv (2024): About actin binding protein vinculin.

      (3) Choi et al., Nat. Comm. 14:2616 (2023): About a mathematical model of the TCR catch bond.

      Catch bond mechanisms of FimH and vinculin are different from that of TCR in that FimH and vinculin have relatively well-defined weak- and strong-binding states where there are corresponding crystal structures. Availability of the end-state structures enable using simulation approaches such as enhanced sampling of individual states and studying the transition between the two states. In contrast, TCR does not have any structurally well-defined weakor strong-binding states, which requires a different approach. As demonstrated in our current manuscript as well as in our previous two papers (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024), our microsecond-long simulations of the complex under realistic pN-level loads and a combination of analysis methods are effective for elucidating the catch bond mechanism of TCR. In the revised manuscript, we will cite the two papers, to compare the TCR catch bond mechanism with those of FimH and vinculin, which will offer a broader perspective.

      The third paper (Choi, 2023) proposes a mathematical model to analyze extensive sets of data, and also perform new experiments and additional simulations. Of note, their model assumptions are based mainly on the steered MD (SMD) simulation in their previous paper (Wu, et al., Mol. Cell. 73:1015, 2019). In their model, formation of a catch bond (called catch-slip bond in Choi’s paper) requires partial unfolding of MHC and tilting of the TCR-pMHC interface. While further studies are needed to find whether those changes are indeed required, even so, the question remains regarding how the complex in the fully folded state can bear load and enter such a state in the first place. Our current and previous simulation studies suggest a mechanism by which ligand- and load-dependent responses occur as the first obligatory step of catch bond formation, after which partial unfolding and/or extensive conformational transitions may occur, as described in our recent paper (Akitsu et al., Sci. Adv., 2024). In the revised manuscript, we will cite Wu’s paper and briefly explain the above.

      (4) The authors should make at least the input files for their system available in a public place (github, zenodo) so that the systems are a more useful reference system as mentioned above. The authors do not have a data availability statement, which I believe is required.

      As mentioned above, we will make sample input files and coordinates available on Github. Data availability statement will be added.

      Reviewer 3:

      Summary:

      The paper by Chang-Gonzalez et al. is a molecular dynamics (MD) simulation study of the dynamic recognition (load-induced catch bond) by the T cell receptor (TCR) of the complex of peptide antigen (p) and the major histocompatibility complex (pMHC) protein. The methods and simulation protocols are essentially identical to those employed in a previous study by the same group (Chang-Gonzalez et al., eLife 2024). In the current manuscript, the authors compare the binding of the same pMHC to two different TCRs, B7 and A6 which was investigated in the previous paper. While the binding is more stable for both TCRs under load (of about 10-15 pN) than in the absence of load, the main difference is that, with the current MD sampling, B7 shows a smaller amount of stable contacts with the pMHC than A6.

      Strengths:

      The topic is interesting because of the (potential) relevance of mechanosensing in biological processes including cellular immunology.

      Weaknesses:

      The study is incomplete because the claims are based on a single 1000-ns simulation at each value of the load and thus some of the results might be marred by insufficient sampling, i.e., statistical error. After the first 600 ns, the higher load of B7high than B7low is due mainly to the simulation segment from about 900 ns to 1000 ns (Figure 1D). Thus, the difference in the average value of the load is within their standard deviation (9 +/- 4 pN for B7low and 14.5 +/- 7.2 for B7high, Table 1). Even more strikingly, Figure 3E shows a lack of convergence in the time series of the distance between the V-module and pMHC, particularly for B70 (left panel, yellow) and B7low (right panel, orange). More and longer simulations are required to obtain a statistically relevant sampling of the relative position and orientation of the V-module and pMHC.

      The reviewer uses data points during the last 100 ns to raise an issue with sampling. But since we are using realistic pN range forces, force fluctuates more slowly. In fact, in our simulation of B7<sup>high</sup>, while the force peaks near 35 pN at 500 ns (Fig. 1D of our manuscript; reproduced as panels C and D below), the contact heat map shows no noticeable changes around 500 ns (Fig. 2C of our manuscript). Thus, a wider time window must be considered rather than focusing on instantaneous force.

      We believe the reviewer’s concern about sampling arose also due to a lack of clear explanation. Author response image 1 below contains panels from our earlier eLife paper on the A6 TCR. Panels A and B are from Fig. 8 of the A6 paper, and panels C and D are from Fig. 1D of our present manuscript. The high-load simulations in both cases (outlined circles) fluctuate widely in force so that one might argue that sampling was insufficient. However, unless one is interested in finding the precise value of force for a given extension, sampling in our simulations was reasonable enough to distinguish between high- and low-force behaviors. To support this, we show panel E below, which is from Appendix 3–Fig. 1 of our A6 paper. Added to this panel are the average forces and standard deviations of B7<sup>low</sup> and B7<sup>high</sup> from Table 1 of our manuscript (red squares). Please note that all of the data were measured after 500 ns. Except for Y8A<sup>low</sup> and dFG<sup>low</sup> of A6 (explained below), all of the data points lie on nearly a straight line.

      Author response image 1.

      Thermodynamically, the force and position of the restraint (blue spheres in Fig. 1A of our manuscript) form a pair of generalized force and the corresponding spatial variable in equilibrium at temperature 300 K, which is akin to the pressure P and volume V of an ideal gas. If V is fixed, P fluctuates. Denoting the average and std of pressure as ⟨P⟩ and ∆P, respectively, Burgess showed that ∆P/P⟩ is a constant (Eq. 5 of Burgess, Phys. Lett. A, 44:37; 1973). In the case of the TCRαβ-pMHC system, although individual atoms are not ideal gases, since their motion leads to the fluctuation in force on the restraints, the situation is analogous to the case where pressure arises from individual ideal gas molecules hitting the confining wall as the restraint. Thus, the near-linear behavior in panel E above is a consequence of the system being many-bodied and at constant temperature. The linearity is also an indirect indicator that sampling of force was reasonable. The fact that A6 and B7 data show a common linear profile further demonstrates the consistency in our force measurement. That said, the B7 data points (red in panel E) are elevated slightly above nearby A6 data points. This is consistent with B7 forming an overall weaker complex, both at the TCR-pMHC interface (panels A vs. C) and within intra-TCR interfaces (panels B vs. D), which can be seen by the wider ranges of color bars in panels A and B for A6 compared to panels C and D for B7.

      About the two outliers of A6, Y8A<sup>low</sup> is for an antagonist peptide and dFG<sup>low</sup> is the Cβ FG-loop deletion mutant. Interestingly, both cases had reduced numbers of contacts with pMHC, which likely caused a wider conformational motion, hence greater fluctuation in force.

      A similar argument applies to Fig. 3E of our manuscript. If precise values of the V-module to pMHC distance were needed, longer or duplicate simulations would be necessary, however, Fig. 3E as it currently stands clearly shows that B7<sup>high</sup> maintains more stable interface compared to B7<sup>low</sup>, which is consistent with all other measures we used, such as Fig. 3B (Hamming distance), Fig. 3C (buried surface area), and Fig. 4A–E (Vα-Vβ motion and CDR3 distance). They are also consistent with our simulations of A6.

      Thus, rather than relying on peculiarities of individual trajectories, we analyze data in multiple ways and draw conclusions based on features that are consistent across different simulations. Please also note that reviewer 1 mentioned that our conclusions are “generally well supported by data.”

      We will update our manuscript to concisely explain the above and also will add Panel E above as a supplement of Fig. 1.

      It is not clear why ”a 10 A distance restraint between alphaT218 and betaA259 was applied” (section MD simulation protocol, page 9).

      αT218 and βA259 are the residues attached to a leucine-zipper handle in in vitro optical trap experiments (Das, et al., PNAS 2015). In T cells, those residues also connect to transmembrane helices. Author response image 2 is a model of N15 TCR used in experiments in Das’ paper, constructed based on PDB 1NFD. Blue spheres represent Cα atoms corresponding to αT218 and βA259 of B7 TCR. Their distance is 6.7 ˚A. The 10-˚A distance restraint in simulation was applied to mimic the presence of the leucine zipper that prevents excessive separation of the added strands. The distance restraint is a flat-bottom harmonic potential which is activated only when the distance between the two atoms exceeds 10 ˚A, which we did not clarify in our original manuscript. The same restraint was used in our previous studies on JM22 and A6 TCRs.

      We will add the figure as a supplement of Fig. 1, cite Das’ paper, and also update description of the distance restraint in the MD simulation protocol section.

      Author response image 2.

    1. Author response:

      Reviewer #1 (Public review):

      The significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      We greatly appreciate the reviewer’s insightful comment regarding the significance of the target molecule and its mechanisms in understanding the molecular actions of metformin. ATP5I is responsible for the dimerization of the F<sub>1</sub>F<sub>0</sub>-ATPase(1-3). Hence, we propose conducting BN-PAGE followed by a western blot using the β-subunit of the F1 domain of F1F0-ATP synthase to investigate whether metformin affects its dimerization. This will provide a more direct evidence of the on target action of metformin on ATP5I. Due to the high abundance of F<sub>1</sub>F<sub>0</sub>-ATP synthase in cells and the slow ability of metformin to enter mitochondria, we plan to perform long-term treatments (3 and 6 days) with high concentrations of metformin (10 mM) to enhance the likelihood of detecting subtle yet biologically relevant shifts in the monomer and dimer populations. Prolonged exposure is expected to reveal the cumulative effects of metformin on F<sub>1</sub>F<sub>0</sub>-ATP synthase dimers/monomers ratio. We do not expect that metformin will totally mimic the cumulative effect of the dimerization as in ATP5I KO cells but we think it will be important to report to what extent this ratio is affected.

      Reviewer #2 (Public review):

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      We appreciate the reviewer’s insightful comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we will revise the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 31 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      Concerning the mass spectrometry results, our intention was to provide a comprehensive table summarizing these findings in a separate data sheet, as part of the data availability section. To address the reviewer’s comment and ensure full transparency, we will include this table as supplementary material in the revised manuscript. Additionally, we will update the methodology section to explicitly state these criteria and ensure clarity regarding the identification process.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and the use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      Regarding the recombinant protein used for SPR, its purity was evaluated using a Coomassie-stained gel. For the antibody used in immunoblotting, its specificity was validated through knockout cell lines, ensuring minimal concerns about non-specific immunoactivity within the relevant molecular weight range. Unfortunately, the KO data comes in the paper after the first immunoblots are presented. In the revised manuscript, we will clearly outline these validation steps in the methods section and additional manufacturer documentation for the antibody we used.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Figure 3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy of metformin (and most of the data in Figures 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      The mitochondrial membrane potential depends on a functional electron transport chain which drives proton pumping from the matrix to the intermembrane space. Metformin can decrease the mitochondrial membrane potential and this usually explained as a consequence of complex I inhibition(4). It has been published the metformin requires this membrane potential to accumulate in mitochondria so the actions of metformin are self-limiting due to this requirement. The reviewer is right that ATP5I KO cells could be resistant to metformin because they may have a lower membrane potential. We do not believe this to be the case because the response to phenformin, another biguanide that can enter mitochondria through the membrane without the need of the OCT transporters(5), is also affected in ATP5IKO cells. Of note, compensatory mechanisms such as enhanced glycolysis, as observed in ATP5I-KO cells (elevated ECAR and increased sensitivity to 2-D-deoxyglucose), and the ATPase activity of F<sub>1</sub>F<sub>0</sub>-ATP synthase could potentially help maintain membrane potential suggesting that this might not be an issue in the ATP5I KO cells. We will discuss these possibilities in the revised manuscript.

      Nevertheless, to experimentally address this point, we propose measuring mitochondrial membrane potential using tetramethylrhodamine methyl ester (TMRE) and ATP levels using luciferase-based assays (CellTiter-Glo) in ATP5I-KO cells.

      Regarding the NAD+/NADH in both control and KO cells may not be very helpful because this ratio can be corrected by LDH which is induced as part of the glycolytic adaptation that occurs after inhibition of respiration. Since our KO cells have been propagated already for several passages, the extent of this adaptation is likely different from metformin-treated cells. As we mentioned in answering Reviewer 1, we will provide a more direct measurement of metformin acting on ATP5I: the levels of F1F0-ATPase dimers and monomers.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum-containing medium). The rationale for the use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue of whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      To address the reviewer’s final comment, we would like to clarify the rationale behind our experimental approach. NALM-6 cells are very glycolytic, have low respiration rates, and weak dependence on ATP5I (DepMap score: -0.47)(6). The concentration of 16 mM metformin was chosen based on the IC50 for this cell line. This approach aligns with our focus on the anticancer mechanism of action rather than the antidiabetic effects of metformin. Both ATP status and NAD+/NADH ratios will depend on the extent of the compensatory glycolysis. On the other hand, our genetic screening evaluates cell proliferation as an integration of all metabolic activities required for the process. This unbiased screening revealed a common pathway affected by metformin and oligomycin different that the pathway affected by rotenone, which is consistent with the finding that metformin acts of the F<sub>1</sub>F<sub>0</sub>ATPase.

      Reviewer #3 (Public review):

      (1) Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information on the effect of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP, and Antimycin A/rotenone, to understand the contribution of individual complexes to respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      We appreciate the reviewer’s observations regarding the Seahorse measurements and acknowledge the potential limitations of presenting the data as fold change. Due to experimental challenges in maintaining KP-4 and ATP5I-KO cells with sufficient nutrients, caused by their rapid glucose uptake and subsequent lactate production, it was more practical to present the Seahorse results in this format. Using inhibitors at each time point during the Seahorse experiment was not feasible, as the delay between inhibitor injections and the corresponding changes in oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) would introduce variability and complicate the interpretation of dynamic responses. Nevertheless, we recognize the importance of understanding the contributions of specific respiratory complexes to OCR and ECAR. To address this, we will include a representative figure showcasing a typical Seahorse analysis, highlighting ATP turnover and proton leak after oligomycin addition, maximal respiration with FCCP, and disruption with rotenone and antimycin A. While these experiments are inherently complex due to the metabolic demands of ATP5I-KO cells, this approach will provide a clearer breakdown of mitochondrial activity. Furthermore, as mentioned in our response to Reviewer 2, we will measure ATP levels using a luciferase-based assay (CellTiter-Glo) in both control and ATP5I-KO cells to better explain AMPK activation. This will provide additional context to strengthen the interpretation of mitochondrial function and metabolic compensation mechanisms in these cells.

      (2) The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiasedly and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape, and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Concerning the analysis of mitochondrial morphology, we acknowledge the potential benefits of using Fiji and additional plugins such as MiNA for more accurate and unbiased quantification. Indeed, this approach could provide stronger evidence for mitochondrial fragmentation upon ATP5I-KO and its potential rescue by ATP5I reintroduction. We will consider integrating this methodology into our analysis to enhance the precision and robustness of our findings.

      (3) Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about the molecular function of the obtained hits based on literature and on a comparison of the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be toned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      NALM-6 are very glycolytic, have low respiration rates, and weak dependence on ATP5I(6), forcing us to use higher concentrations of metformin to inhibit their growth. Recent results show that metformin targets PEN2 in the cytosol to increase AMPK activity, controlling both the glucose lowering and the life span extension abilities of metformin 7. This work raises the question whether the antiproliferative and anticancer effects of metformin are due to a mitochondrial activity or are controlled by this new pathway of AMPK activation. Hence, the genetic screening was performed to unbiasedly find how metformin works. The results provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin and a foundation for future studies. We will revise the text and abstract to better reflect the exploratory nature of this finding and ensure clarity.

      (1) Paumard, P. et al. Two ATP synthases can be linked through subunits i in the inner mitochondrial membrane of Saccharomyces cerevisiae. Biochemistry 41, 10390-10396 (2002). https://doi.org/10.1021/bi025923g

      (2) Paumard, P. et al. The ATP synthase is involved in generating mitochondrial cristae morphology. EMBO J 21, 221-230 (2002). https://doi.org/10.1093/emboj/21.3.221

      (3) Habersetzer, J. et al. ATP synthase oligomerization: from the enzyme models to the mitochondrial morphology. Int J Biochem Cell Biol 45, 99-105 (2013). https://doi.org/10.1016/j.biocel.2012.05.017

      (4) Xian, H. et al. Metformin inhibition of mitochondrial ATP and DNA synthesis abrogates NLRP3 inflammasome activation and pulmonary inflammation. Immunity 54, 1463-1477 e1411 (2021). https://doi.org/10.1016/j.immuni.2021.05.004

      (5) Hawley, S. A. et al. Use of cells expressing gamma subunit variants to identify diverse mechanisms of AMPK activation. Cell metabolism 11, 554-565 (2010). https://doi.org/10.1016/j.cmet.2010.04.001

      (6) Hlozkova, K. et al. Metabolic profile of leukemia cells influences treatment efficacy of L-asparaginase. BMC Cancer 20, 526 (2020). https://doi.org/10.1186/s12885-020-07020-y

      (7) Ma, T. et al. Low-dose metformin targets the lysosomal AMPK pathway through PEN2. Nature 603, 159-165 (2022). https://doi.org/10.1038/s41586-022-04431-8

    1. Author response:

      We thank the reviewers for taking the time to read and critically assess our manuscript.

      We agree with the main points and they will be addressed in both writing and in additional experiments in a revised version of the paper.

      The shared and major point of criticism are non-conclusive metabolomic data that indicate the bc1-complex in T. gondii as a MMV1028806 target tachyzoites and bradyzoites. Regarding the former, our conclusion was mainly based on both metabolite abundance changes that are observed after treatment with one bona-fide bc1-complex inhibitor atovaquone and also steady-state stable isotope incorporation patterns. While it is true that secondary effects of metabolic inhibition occur and are often dominant, isotope labelling equilibria take more time to establish and may reflect more accurately blocked metabolic reactions i.e. the primary target.

      Regardless, we will follow the excellent suggestions to functionally assay particular mitochondrial electron transfer reactions to corroborate or revise our conclusions regarding the primary MMV1028806 target.

      For more details please refer the full author responses that will accompany the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers suggest a number of experiments and re-analyses to strengthen their claims and enhance the impact of the study. While a number of these are longer term, below is a summary of experiments and analyses recommended by the reviewers that can be accomplished in the shorter term:

      (1) Clarification of statistical approaches, quantification, data presentation and description of cerebellar anatomical nomenclature (e.gs. detailed statistical methods for the GEO dataset analysis, FDR correction, quantification in Figs 2-4)

      The revised manuscript will provide detailed statistical methods including FDR  correction for GEO dataset analyses and quantification. Please see specific responses to GEO dataset analyses below.

      (2) Improved quality of images for select immunostains and in situ hybridization

      The revised manuscript will address the quality of the images as indicated by the reviewers.

      (3) Include a control group of hGFAP-Cre mice with loxP sites but without Sufu deletion to assess the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MB<sup>SHH</sup> subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MB<sup>SHH</sup> subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MB<sup>SHH</sup> subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MB<sup>SHH subgroup (MB<sup>WNT</sup> n=70, MB<sup>SHH</sup> n=224, MB<sup>GR3</sup> n=143, MB<sup>GR4</sup> n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MB<sup>SHH</sup> compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MB<sup>SHH</sup> compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MB<sup>SHH</sup> patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MB<sup>SHHβ</sup> and MB<sup>SHHγ</sup> Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MB<sup>SHH</sup> tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MB<sup>SHH</sup> subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MB<sup>SHH</sup> infant patient tumors (n=80 MB<sup>SHHα</sup> and MB<sup>SHHγ</sup> tumors) unlike 35% of MB<sup>SHHα</sup> (n=65) or 0% of MB<sup>SHHδ</sup>  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

      Recommendations for the authors:

      Editorial Comments:

      The reviewers suggest a number of steps to improve the manuscript that include additional experiments and a deeper analyses and re-evaluation of existing data. Short of significant new experiments, there appears to be number of straightforward analyses that can improve the study:

      (1) Reanalyses of statistical and quantitative approaches used (e.gs FDR correction, cerebellar deficits, GEO analyses.

      The revised manuscript will include detailed information on the statistical and quantitative approaches as addressed in our response to the reviewer’s comments.

      (2) More clear presentation of qualitative labeling approaches (immunohistochemistry and in situ hybridization).

      A detailed description of the protocols used will be included in  the methods section for labeling methods in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      AZD4547 treatment of the dKO mice would provide more convincing evidence that FGF-targeted treatments could curtail tumor growth in these mice or refute the suggestion that FGF-targeted treatment could prevent tumor growth.

      We agree that performing AZD4547 treatment on Sufu-dKO mice will strengthen these studies. However, we are unable to address since these mice are now unavailable. We hope that future studies will address these.

      Atoh1 is referred to as Math1 (older nomenclature) and should be corrected.

      The revised manuscript will include this change in nomenclature.

      Check verb tense throughout the manuscript.

      We will edit the manuscript further to check verb tenses prior to submission of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Specific Comments:

      (1) The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper. However, the causal relationship between FGF5 and MB remains unestablished. Based on the current data, FGF5 can only be considered a biomarker for stratifying MB.

      We agree with the reviewer that our studies do not provide direct evidence that FGF5 cause MB. Future investigation focusing on determining if FGF5 inhibition leads to phenotypic rescue will strongly establish the relationship between FGF5 and MB. The reviewer is also correct that our studies reveal that FGF5 acts as a potential biomarker, as we mentioned in the Discussion section.

      (2) The upregulation of Fgf5 in Sufu-deficient cerebella is crucial to this study, yet the presented data are unconvincing to support this conclusion. In comparing Fgf5 expression between WT and Sufu mutants (Figures 2E, F and 4I), the cerebellar sections differ significantly, with mutant sections seemingly from a more lateral position. The authors should provide images of mutant sections from more comparable positions to accurately assess the effect of Sufu deficiency on Fgf5 expression. Additionally, the signals in Figure 2F resemble non-specific backgrounds rather than specific RNAscope signals.

      The WT and mutant sections analyzed were carefully selected from comparable levels. The abnormal foliation in Sufu-cKO make the mutant sections look like they are from the lateral cerebellum.

      Figure 2F (enlarged regions) point to punctate RNAScope signals which is characteristic of this labeling method (see RBFOX3 or GFAP labeling in DAPI-labeled cells in the mouse brain at https://acdbio.com/science/applications/research-areas/neuroscience). The higher number of punctate signals in some, but not all, DAPI-labeled cells in Figure 2F indicate that the FGF5 RNAScope signal is specific.

      (3) Jiwani et al. (2020) reported that Fgf8 also expressed in region B of the EGL, is upregulated in Sufu-deficient cerebella and is necessary and sufficient for Sufu mutant GCP proliferation. The current study does not distinguish whether the FGFR inhibitor AZD4547 blocks Fgf5 and Fgf8 function in restoring cerebellar histology in Sufu mutants.

      AZD4547 potently inhibits FGFR1, FGFR2, and FGFR3 autophosphorylation (Gavine et al., Cancer Research, 2012). FGF8 is reported to bind to these receptors (Ornitz and Itoh, 2015). Thus, the reviewer is correct that the studies will not distinguish between FGF5 or FGF8 activity. Further investigation on FGF8 expression and the effects of its inhibition in the Sufu-cKO neonatal cerebellum will determine whether tumorigenic processes are driven by either FGF5 or FGF8. Nevertheless, we postulate that FGF5 is exerting a greater effect in activating FGF signaling in the developing cerebellum given that it is highly expressed along the external granule layers of the developing cerebellum (Author response image 3).

      Author response image 3.

      Expression of FGF5 and FGF8 in the P4 mouse cerebellum (Allen Brain Atlas, https://developingmouse.brain-map.org )

      (4) The authors should show whether AZD4547 treatment restores normal Fgf5 expression. Importantly, they need to test whether AZD4547 rescues the proliferation defect observed in Sufu;p53 double mutants.

      We agree that performing AZD4547 treatment on Sufu-dKO mice will strengthen these studies. However, we are unable to address since these mice are now unavailable. We hope that future studies will address these.

      (5) Jiwani et al. (2020) showed that deleting Sufu with Atoh1-Cre promotes Gli3R and suppresses Gli2 levels, leading to increased cell proliferation and delayed cell cycle exit in the central lobe. The findings of the current study (Supplementary Figure 1) seem to differ from this previous report, yet both studies conclude that Sufu-KO disrupts differentiation. The authors should provide an explanation for this discrepancy.

      Our results align completely with the findings by Jiwani et al. (2020). Both studies showed reduced levels of Gli3R, showing nearly 50% reduction, when Sufu is deleted (see Figure 4A-4D in Jiwani et al., 2020).

      (6) The hGFAP-Cre mouse line is used to delete Sufu from the cerebellum, but it is not commonly used for GCP-specific deletion. The authors need to provide a reference or more details on the temporal and spatial activity of the Cre line, as the cited paper describes its generation but offers little information on its activity in the developing cerebellum.

      We appreciate the reviewer’s reminder to include the reference for the Schuller et al. 2008 paper. This study characterized the hGFAP-Cre temporal and spatial expression in the developing cerebellum, including granule cell precursors. We will include this reference in the revised manuscript.

      (7) Based on the provided data, it is difficult to determine which cell types express Fgf5. Given that hGFAP-cre may delete Sufu in other cerebellar cell types, the authors should demonstrate that Fgf5 is expressed in granule cells or granule cell precursors.

      Future studies will focus on further characterization of the role of FGF5 in cerebellar development, including the identity cells expressing FGF5. The reviewer is correct in that hGFAP-Cre also targets other cell types and that Sufu deletion in these cells induced ectopic FGF5 expression.

      (8) The provided data show an increase in pERK+ cells in GCPs at the secondary fissure. This increase may simply reflect an accumulation of GCPs. It is unconvincing that there is an increase in pERK due to the loss of Sufu.

      The reviewer is correct that the increase in GCPs will also result increase the number of pERK+ cells. To control for this, our quantification reflects the number of cells per unit area where Ki67+ cells. With these parameters, we found that there is an increased density of pERK+ cells in a given Ki67+ region. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (9) No data are provided on MB formation in Sufu-cKO; p53- mutants, and it is unknown whether FGFR inhibitors block tumor formation.

      We agree that performing AZD4547 treatment on Sufu-dKO mice will strengthen these studies. However, we are unable to address since these mice are now unavailable. We hope that future studies will address these.

      (10) The authors frequently mention "preneoplastic lesions" of GCPs in Sufu mutant mice. What evidence supports this claim?

      Preneoplastic lesions are defined as cells carrying genetic and phenotypic alterations that show higher risk of malignancy (such as MB) but lack the capacity to grow autonomously in the absence of a secondary factor (Feo et al., 2011). In Sufu-cKO mice, we see abnormally proliferating and behaving granule precursor cells that do not grow autonomously, in the absence of a p53 LOF. The combined deletion of Sufu and p53 transforms these cells to become neoplastic.

      (11) Fgf5 is normally expressed in region B. What is its potential function? Does AZD4547 affect normal development? 

      Future studies will focus on further characterization of the role of FGF5 in cerebellar development, including the identity cells expressing FGF5. Regarding AZD4547, we did not observe any obvious difference between AZD4547-treated and vehicle-treated cerebelli. These indicate that AZD4547 inhibition of FGFRs under physiologic conditions does not significantly disrupt normal cerebellar development.

      (12) Figure 3G: It is unclear which specimens were treated with AZD4547. The authors mention treatment in line 281 but contradict themselves in the figure legend.

      We thank the reviewer for pointing out this typo. Cerebellar tissues shown in Figure 3G were all treated with AZD4547. The figure legend will be corrected in the revised manuscript.

      (13) Figure 4J: The higher magnification images of the pERK/Ki67 staining appear identical in the control and Sufu;p53-dKO. The authors need to correct the mistake.

      We thank the reviewer for pointing this out. We will correct this figure in the revised manuscript.

      Minor Comments:

      (1) Whenever possible, images comparing WT and mutants should be presented at the same scale within a figure. For example, readers might easily conclude that mutant brains are smaller than controls in Figure 4G.

      Unfortunately, because the cerebellum of Sufu;p53-dKO mice are significantly bigger, we are unable to show the whole cerebellum in the same scale in Figure 4G. We wanted to emphasize the significant and abnormal cerebellar growth in this figure.

      (2) The figure legend for Supplementary Figure 2 is missing.

      Thank you for pointing this out. We will add a figure legend in this Supplementary data in the revised manuscript.

      (3) The authors state that the expansion of Pax6+ GNPs in the newborn Sufu-cKO cerebellum (Figure 2) occurs in similar anatomical subregions where infantile MB tumors typically arise (Tan et al., 2018). The cited paper describes more abundant SHH MB in the cerebellar hemisphere. The authors need to elaborate on their statement to clarify this point.

      The reviewer is correct in that Tan et al., 2018 observed tumors arising from the cerebellar hemisphere. More specifically, these tumors arise in the posterior/ventral regions of the cerebellar hemispheres (Figure 2 in Tan et al., 2018). Similarly, Sufu-cKO mice have more severe defects in the posterior/ventral regions of the cerebellar hemisphere (Figures 2A and 3F) and therefore corroborate the findings by Tan et al., that abnormal SHH signaling in these regions results in increased sensitivity to MB formation.

      Reviewer #3 (Recommendations For The Authors):

      Figure1 [Upregulated FGF5 expression in MBS-HH tumors]

      - Statistical analysis from the Geo expression dataset does not provide enough detail. At least, the authors should mention whether they have made any adjustments from the default settings and how they extracted/plotted the FGF5 expression (Figure 1BCE).

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MB<sup>SHH</sup> subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MB<sup>SHH</sup> subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). See Author response table 1 for sample sizes.

      Figure 3 [Ectopic activation of FGF signaling in the EGL of P0 Sufu-cKO cerebellum]

      - Gil1-lz mice reference wrong. Correct Bai CB, et al. 2002

      - Generation of Sufu-cKO;Gli1-LacZ triple transgenic mice not described 

      - Veh vs. treated not labelled (Figure 3F)

      We will address these minor text changes in the revised manuscript. A more detailed description of the generation of Sufu-cKO;Gli1-LacZ triple transgenic will also be included in the Methods section.

      Figure 5 [Proposed model]

      - In the text, Figure 5 is mistaken for Figure 8. 

      We will address these minor text changes in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4.

      They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      We have now edited the abstract lines 38-39 adding that “By contrast, over-expressed DNT-2 increased DAN cell number,…”, within the main text in Results page 10 lines 259-265 and in the Discussion section page 15 lines 391, 393-396.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ TH-negative cells too (although not widely throughout the brain), although this is not shown in the images of Figure 3H where we showed only TH+ Dcp+ cells.

      That is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We have made some edits on these points in page 10 lines 259-265.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain. Furthermore, it is remarkable that this involves a neurotrophin functioning via Toll and kinase-less Trks, opening an opportunity to explore whether such a mechanism could also operate in the human brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity. We also showed a link of DNT-2 to neuronal activity, as neuronal activity increased the production of DNT-2GFP, induced the cleavage of DNT-2 and a feedback loop between DNT-2 and dopamine, and both neuronal activity and increased DNT-2 levels promoted synaptogenesis.

      As the Reviewer acknowledges this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. Finding out the direct link in response to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context together with the link to mammalian neurotrophins (as explained in the discussion), as it is here where the findings have deep and impactful implications.

      To accommodate the criticism of this Reviewer, we have now toned down our narrative. This does not diminish the importance of the findings, it makes the argument more stringent. Please see edits in: Abstract page 2 lines 42-44; and Discussion page 22 line 586 – which were the only points were a direct claim had been made.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In response, we have removed the word ‘excitatory’ from the only point it had been used in the text: page 7 line 167.

      In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Other targets regulate synaptic vesicle release. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      We agree with this Reviewer that whereas our qRT-PCR data show that over-expression of DNT-2 increases TH mRNA levels, this does not demonstrate that originates from PAM neurons. Similarly, although our EPAC data imply that dopamine must be released from DANs and received by DNT-2 neurons to explain those data, the evidence did not include direct visualisation of dopamine release in response to DNT-2 neuron activation. To accommodate these criticisms, we have edited the summary Figure 2E adding question marks to indicate inference points and page 9 line 221.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We had already provided a description of the use of Imaris in the methods section.

      We have now exapanded the protocol on how we use Imaris to analyse dendrites and synapses, in: Materials and Methods section, page 28 lines 756-768 and page 29 lines 778-799.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity. Importantly, for every staining we carried out using DNT-2GAL4 and various membrane reporters and MCFO clones, we never found two identical DNT-2 neuron profiles.

      To increase the evidence in support of this point, we have now expanded Figure 1, adding one more image of DNT-2>FlyBow (Figure 1A) and two more images of DNT-2>MCFO (Figure 1D). In total, seven images in Figure 1 and two further images in Figure 5A demonstrate the variability of DNT-2 neurons.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      In the fly list, several fly lines are missing references and sources. 

      Apologies for this over-sight, this has now been corrected.

      We thank Reviewer 1 for their effort and time to scrutinise our work, and for their very positive and helpful feedback.

      Reviewer #2 (Recommendations for the authors):

      (1) Here I provide some more specific comments that I hope will help the authors further improve the study.

      (2) L148: "single neuron clones revealed variability in the DNT-2A". How do the authors know that they are labeling the same subtype of DNT-2A neurons? 

      There are four anterior DNT-2A cells per hemibrain, that project from the SOG area to the SMP. It is not possible to verify that every time we look at exactly the same neuron, because the exact position of the somas and the arborisation patterns vary from brain to brain. We know this from two sources of data: (1) when using DNT-2GAL4 to visualise the expression of membrane reporters (e.g. UAS-FlyBow, UAS-mCD8-GFP, UAS-CD8-RFP) no brain ever showed a pattern identical to that of another brain, neither in the exact position of the somas nor in the exact arborisation patterns. (2) When we generated DNT-2>MCFO clones to visualise 1-2 cells at a time, no single neuron or 2-neuron clones ever showed an identical pattern. The most parsimonious interpretation is that the exact location of the somas and the exact arborisation patterns vary across individual flies. Developmental variability in neuronal patterns has also been reporter by Linneweber et al (2020) Science.

      To make our evidence more compelling, and in response to this Reviewer’s query, we have now added further images. Please find in revised Figure 1 A,B three examples of three different brains expressing DNT-2>FlyBow1.1. In Figure 1D, two more examples (altogether 4) of DNT-2>MCFO clones. Here it is clear to see that no neuron shape is identical to that of others, demonstrating variability in individual fly brains. We now show four images in Figure 1 and two more in Figure 5A that demonstrate the variability of DNT-2A neurons.

      (3) Figure 1E: Are all DNT-2A neurons positive for vGlut and Dop2R? This figure shows only two DNT-2A neurons. 

      Yes, all four DNT-2A neurons per hemibrain are vGlut positive and we have now added more images to Supplementary Figure S1A (right), also showing that presynaptic DNT-2A endings at SMP also coincide with a vGlut+ domain (Figure S1A left).

      Yes, all all four DNT-2A neurons per hemibrain are Dop2R positive and we have now added more images to Supplementary Figure S1B.

      (4) L156: Glutamate is generally considered to be inhibitory in the adult fly brain. More evidence is needed before the authors can claim that "DNT-2A neurons are excitatory glutamatergic neurons". 

      Thank you for pointing this out. Although our data do not conclusively demonstrate it, they are consistent with DNT-2A neurons being excitatory. BDNF is most commonly released from glutamatergic neurons in mammals, its release is activity-dependent and leads to formation and stabilisation of synapses.  The phenotypes we have observed are consistent with this and reveal functional evolutionarily conservation: (1) exciting DNT-2 neurons with TrpA1 results in increased production and cleavage of DNT-2GFP and de novo synaptogenesis; (2) over-expression of DNT-2 in the adult induces de novo synaptogenesis; (3) down-regulation or loss of DNT-2 and its receptors Toll-6 and Kek-6 impair synaptogenesis. Furthermore, we show that DNT-2 dependent synaptogenesis is between DNT-2 and dopaminergic neurons, which are involved in the control of locomotion, reward learning and long-term memory, and dopamine itself is required for such behaviour. Consistently with this we found that: (1) over-expression of DNT-2 increases TH mRNA levels, which would lead to the up-regulation of dopamine production; (2) exciting DNT-2 neurons increases locomotion speed in an arena; (3) knock-down of DNT-2 and its receptors decreases locomotion, whereas over-expression of DNT-2 increases locomotion; (4) over-expression of DNT-2 increases learning and long-term memory. Finally, in a previous version in bioRxiv, we also showed using optogenetics and calcium imaging that exciting DNT-2 neurons induced GCaMP signalling in their output PAM neurons, and in this version we show that exciting DNT-2 neurons regulates cAMP in DNT-2 neurons via dopamine-release dependent feedback. Altogether, the most parsimonious interpretation of these data is that vGlut+ DNT-2 neurons are excitatory.

      In any case, to address this reviewer’s point, we have now removed the word ‘excitatory’ from page 7 line 167.

      (5) Figure 1H, I: A more detailed description of the Toll-6 and Kek-6 expressing neurons will be helpful. Are they expressed in specific types of PAM and PPL1 DANs? The legend in Figure S2 mentions labeling in γ2α′1 zones, but it seems to be more than that.

      This information had been already provided, presumable this Reviewer overlooked this. This was already described in great detail by comparing our microscopy data with the single cell RNA-seq data available through Fly Cell Atlas (https://flycellatlas.org) and Scope (https://scope.aertslab.org/#/b77838f4-af3c-4c37-8dd9-cf7a41e4b034/*/welcome).

      Please see our previously submitted Table S1 “Expression of Tolls, keks and Toll downstream adaptors in cells related to DNT-2A neurons”.

      (6) Figure S3 should be controls for Figure 2A. It is incorrectly labeled as controls for Figure 3A. 

      Thank you for pointing out this typo, this has now been corrected.

      (7) L197: The authors state, "This showed that DNT-2 could stimulate dopamine production in neighboring DANs". However, the results do not fully support this conclusion because the experiments measure overall TH levels in the brain, not specifically in neighboring DANs. The observed effect could be indirect via other neurons. 

      Indeed, we have now edited the text to: “This showed that DNT-2 could stimulate dopamine production”: page 8 line 208.

      (8) Figure 3: If Toll-6 is expressed in specific subtypes of PAM DANs, are they the dying cells when Toll-6 was knocked down? I think the paper will be significantly improved if the authors provide a more in-depth analysis of the phenotype. Also, permissive temperature controls are missing for the experiments in (E)-(H). Permissive controls are essential to confirm that the observed effects are due to adult-specific RNAi knockdown.

      Current tools do not enable us to visualise Toll-6+ neurons at the same time as manipulating DNT-2 neurons and at the same time as monitoring Dcp1. Stainings with Dcp1 in the adult brain are not trivial. Thus, we cannot guarantee this. However, Toll-6 is the preferential receptor for DNT-2, and given that apoptosis increases when we knock-down DNT-2, the most parsimonious interpretation is that the dying cells bear the DNT-2 receptor Toll-6. Even if DNT-2 can promiscuously bind other Toll receptors, the simplest way to interpret these data remains that DNT-2 promotes cell survival by signalling via its receptors, as no other possible route is known to date. This would be consistent with all other data in this figure.

      We thank this Reviewer for the feedback on the controls. Unfortunately, these are not trivial experiments, they require considerable time, effort, dedication and skill. This manuscript has already taken 5 years of daily hard work. We no longer have the staff (ie the first author left the lab) nor resources to dedicate to address this point.

      (9) Figure 4B: This phenotype in DNT-2 mutants is very striking. Did the neurons still survive and did their axonal innervation in the lobes remain intact?

      Homozygous DNT-2 mutants are viable and have impair climbing, as we had already shown in Figure 7C.

      (10) L261: The authors mention that "PAM-β2β′2 neurons express Toll-6 (Table S1)". However, I cannot find this information in Table S1. 

      Unfortunately, I cannot identify the source of that statement at present and the first authors has left the lab. In any case, although the fact that knocking down Toll-6 in these neurons causes a phenotype means they must, it does not directly prove it. We have now corrected this to: “PAM-b2b'2 neuron dendrites overlap axonal DNT2 projections”, page 11 line 280.

      (11) Figure 4C, D: What about their synaptogenesis? Do they agree with the result in Figure 4B? 

      This was not tested at the time. Unfortunately, these are not trivial experiments and require considerable time, effort, dedication and skill. Addressing this point experimentally is not possible for us at this point. In any case, given the evidence we already provide, it is highly unlikely they would alter the interpretation of our findings and the value of the discoveries already provided.

      (12) L270: The authors state: "To ask whether DNT-2 might affect axonal terminals, we tested PPL1 axons." However, it is unclear why the focus was shifted to PPL1 neurons when similar analyses could have been performed on PAM DANs for consistency. In addition, it would be beneficial to assess dendritic arbor complexity and synaptogenesis in PPL1-γ1-pedc neurons to provide a more comprehensive comparison between PPL1 and PAM DANs. Performing parallel analyses on both neuron types would strengthen the study by providing insight into the generality and specificity of DNT-2 in different dopaminergic circuits. 

      The question we addressed with Figure 4 was whether the DNT-2 and its receptors could modify axons, dendrites and synapses, ie all features of neuronal plasticity. The reason we used PPL1-g1-pedc to analyse axonal terminals was because of their morphology, which offered a clearer opportunity to visualise axonal endings than PAMs did. An exhaustive analysis of PPL1-g1-pedc is beyond the scope of this work and not the central focus.

      (13) Figure 4G lacks a permissive temperature control, which is essential to confirm that the observed effects are due to adult-specific RNAi knockdown. 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      (14) Figure 5A requires quantification and statistical comparison.

      We thank this Reviewer for this feedback. We did consider this, but the data are too variable to quantify and we decided it was best to present it simply as an observation, interesting nonetheless. This is consistent as well with the data in Figure 1, which we have now expanded with this revision, which show the natural variability in DNT-2 neurons.

      (15) Figure 5B: Many green signals in the control image are not labeled as PSDs, raising concerns about the accuracy of the image analysis methods used for synapse identification. While I trust that the authors have validated their analysis approach, it would strengthen the study if they provided a clearer description or evidence of the validation process. 

      This was done using the Imaris “Spot function”, in volume. A threshold is set to exclude spots due to GFP background and select only synaptic spots. The selection of spots and quantification are done automatically by Imaris. All spots below the threshold are excluded, regardless of genotype and experimental conditions, rendering the analysis objective. We have now provided a detailed description of the protocol in the Materials and Methods section: page 29 lines 778-799.

      (16) Figure 5C lacks genotype controls (i.e., DNT2-GAL4-only and UAS-TrpA1-only). These controls are essential because elevated temperatures alone, without activation of DNT2 neurons, could potentially increase Syt-GCaMP production, leading to an increase in the number of Syt+ synapses. Including these controls would help ensure that the observed effects are truly due to the activation of DNT2 neurons and not temperature-related artifacts. 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      (17) L314-316: The authors state, "Here, the coincidence of... revealed that newly formed synapses were stable." I think this statement needs to be toned down because there is no evidence that these pre- and post-synaptic sites are functionally connected. 

      The Reviewer is correct that our data did not visualise together, in the same preparation and specimen, both pre- and post-synaptic sites. Still, given that PAMs have already been proved by others to be required for locomotion, learning and long-term memory, our data strongly suggest that synapses between them at the SMP are functionally connected.

      Nevertheless, as we do not provide direct cellular evidence, we have now edited the text to tone down this claim: “Here, the coincidence of increased pre-synaptic Syt-GFP from PAMs and post-synaptic Homer-GFP from DNT-2 neurons at SMP suggests that newly formed synapses could be stable”, page 13 line 351.

      (18) Figure 5D lacks permissive temperature controls. Also, the DNT-2FL overexpression phenotypes are different from the TpA1 activation phenotypes. The authors may want to discuss this discrepancy. 

      Regarding the controls, these are not appropriate for this data set. These data were all taken at a constant temperature of 25°C, there were no shifts, and therefore do not require a permissive temperature control. We thank this Reviewer for drawing our attention to the fact that we made a mistake drawing the diagram, which we have now corrected in Figure 5D.

      Regarding the discrepancy, this had already been discussed in the Discussion section of the previously submitted version, page 19 Line 509-526. Presumably this Reviewer missed this before.

      (19) Figure 6A, B lack permissive temperature controls. These controls are important if the authors want to claim that the behavioral defects are due to adult-specific manipulations. In addition, there is no statistical difference between the PAM-GAL4 control and the RNAi knockdown group. The authors should be careful when stating that climbing was reduced in the RNAi knockdown flies (L341-342). 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      Point taken, but climbing of the tubGAL80ts, PAM>Toll-6RNAi flies was significantly different from that of the UAS-Toll-6RNAi/+ control.

      (20) Figure 6C: It seems that the DAN-GAL4 only control (the second group) also rescued the climbing defect. The authors may want to clarify this point. 

      The phenotype for this genotype was very variable, but certainly very distinct from that of flies over-expressing Toll-6[CY].

      We thank Reviewer 2 for their very thorough analysis of our paper that has helped improve the work.

      Reviewer #3 (Recommendations for the authors): 

      Overall, the manuscript reports highly interesting and mostly very convincing experiments. 

      We are very grateful to this Reviewer for their very positive evaluation of our work.

      Based on my comments under the heading "public review", I would like to suggest three possible improvements. 

      First, the quantification of structural plasticity at the sub-cellular level should be explained in more detail and potentially improved. For example, 3D reconstructions of individual neurons and quantification of the structure of boutons and dendrites could be undertaken. At present, it is not clear how bouton volumes are actually recorded accurately. 

      Thank you for the feedback. The analyses of dendrites and synapses were carried out in 3D-volumes using Imaris “Filament” module and “Spot function”, respectively. Dendrites are analysed semi-automatically, ie correcting potential branching errors of Imaris, and synapses are counted automatically, after setting appropriate thresholds. Details have now been expanded in the Materials and Sections section: page 28 lines 756-768 and page 29 lines 780-799.

      We would also like to thank Imaris for enabling and facilitating our remote working using their software during the Covid-19 pandemic, post-pandemic lockdowns and lab restrictions that spanned for over a year.

      Second, the variability between DNT-2A-positive neurons with increasing sample size compared to a control (DNT-2A-negative neurons) should be demonstrated. Figure 2C does currently not present convincing evidence of increased structural variability. 

      It is unclear what data the Reviewer refers to. Figure 2C shows qRT-PCR data, and it does not show structural variability, which instead is shown with microscopy. If it is the BacTrace data in Figure 2B, the controls had been provided and the data were unambiguous. If Reviewer means Figure 1C, it is unclear why DNT-2GAL4-negative flies are needed when the aim was to visualise normal (not genetically manipulated) DNT-2 neurons. Thus, unfortunately we do not understand what the point is here.

      The observation that DNT-2 neurons are very variable, naturally, is highly interesting, and presumably this is what drew the attention of Reviewer 3. We agree that showing further data in support of this is interesting and valuable. Thus, in response to this Reviewer’s comment we have now increased the number of images that demonstrate variability of DNT-2 neurons:

      (1) We have added an extra image, altogether providing three images in new Figure 1A showing three different individual brains stained with DNT-2GAL4>UAS-FlyBow1.1. These show common morphology and features, but different location of the somas and distinct detailed arborisation patterns. Two more images using DNT-2GAL4 are provided in Figure 5A.

      (2) We have now added two further MCFO images, altogether showing four examples where the somas are not always in the same location and the axons arborise consistently at the SMP, but the detailed projections are not identical: new Figure 1D.

      These data compellingly show natural variability in DNT-2 neuron morphology.

      Third, I propose to simplify the feedback model (Figure 2F) to be less speculative. 

      Indeed, some details in Figure 2F are speculative as we did not measure real dopamine levels. Accordingly, we have now edited this diagram, adding question marks to indicate speculative inference, to distinguish from the arrows that are grounded on the data we provide.

      Accordingly, we have also edited the text in:

      - page 9, lines 221: “Altogether, this shows that DNT-2 up-regulated TH levels (Figure 2E), and presumably via dopamine release, this inhibited cAMP in DNT-2A neurons (Figure 2F)”.

      - page 20, lines 515: “Importantly, we showed that activating DNT-2 neurons increased the levels and cleavage of DNT-2, up-regulated DNT-2 increased TH expression, and this initial amplification resulted in the inhibition of cAMP signalling via the dopamine receptor Dop2R in DNT-2 neurons.”

      As minor points: 

      (1) Appetitive olfactory learning is based on Tempel et al., (1983); Proc Natl Acad Sci U S A. 1983 Mar;80(5):1482-6. doi: 10.1073/pnas.80.5.1482. This paper should perhaps be cited. 

      Thank you for bringing this to our attention, we have now added this reference to page 14 line 394.

      (2) Line 34: I would add ..."ligand for Toll-6 AND KEK-6,". 

      Indeed, thank you, now corrected.

      (3) Line 39: DNT-2-POSITIVE NEURONS. 

      Now corrected, thank you.

      (4) The levels of TH mRNA were quantified. Why not TH or dopamine directly using antibodies, ELISA, or HPLC? After all, later it is explicitly written that DNT modulates dopamine levels (line 481)! 

      We thank this Reviewer for this suggestion. We did try with HPLC once, but the results were inconclusive and optimising this would have required unaffordable effort by us and our collaborators. Part of this work spanned over the pandemic and subsequent lockdowns and lab restrictions to 30% then 50% lab capacity that continued for one year, making experimental work extremely challenging. Although we were unable to carry out all the ideal experiments, the DNT-2-dependent increase in TH mRNA coupled with the EPAC-Dop2R data provided solid evidence of a DNT-2-dopamine link.

      (5) Line 271: The PPL1-g1-pedc neuron has mainly (but not excusively) a function in short-term memory! 

      They do, but others have also shown that PPL1-g1-pedc neurons have a gating function in long-term memory (Placais et al 2012; Placais et al 2017; Huang et al 2024) and are required for long-term memory (Adel and Griffith 2020; Boto et al 2020).

      (6) Line 401: Reward learning requires PAM neurons. PPL1 neurons are required for aversive learning. 

      Indeed, PPL1 neurons are required for aversive learning, but they also have a gating function in long-term memory common for both reward and aversive learning (Adel and Griffith, 2020 Neurosci Bull; Placais et al, 2012 Nature Neuroscience; Placais et al 2017 Nature Communications; Huang et al 2024 Nature).

      Overall, the manuscript presents extremely interesting, novel results, and I congratulate the authors on their findings. 

      We would like to thank this Reviewer for taking the time to scrutinise our work, their helpful feedback that has helped us improve the work and for their interest and positive and kind works.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The work is important and of potential value to areas other than the bone field because it supports a role and mechanism for beta-catenin that is novel and unusual. The findings are significant in that they support the presence of another anabolic pathway in bone that can be productively targeted for therapeutic goals. The data for the most part are convincing. The work could be strengthened by better characterizing the osteoclast KO of Malat1 related to the Lys cre model and by including biochemical markers of bone turnover from the mice.

      We thank the editors and reviewers for their time and their positive and insightful comments. We are pleased that the editors and reviewers were very enthusiastic, as stated in their Strength comments. We have performed experiments and addressed all of the points raised by the reviewers. We have revised the manuscript accordingly and the reviewers’ points are specifically addressed below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors were trying to discover a novel bone remodeling network system. They found that an IncRNA Malat1 plays a central role in the remodeling by binding to β-catenin and functioning through the β-catenin-OPG/Jagged1 pathway in osteoblasts and chondrocytes. In addition, Malat1 significantly promotes bone regeneration in fracture healing in vivo. Their findings suggest a new concept of Malat1 function in the skeletal system. One significantly different finding between this manuscript and the competing paper pertains to the role of Malat1 in osteoclast lineage, specifically, whether Malat1 functions intrinsically in osteoclast lineage or not.

      Strengths:

      This study provides strong genetic evidence demonstrating that Malat1 acts intrinsically in osteoblasts while suppressing osteoclastogenesis in a non-autonomous manner, whereas the other group did not utilize relevant conditional knockout mice. As shown in the results, Malat1 knockout mouse exhibited abnormal bone remodeling and turnover. Furthermore, they elucidated molecular function of Malat1, which is sufficient to understand the phenotype in vivo.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      Discussing differences between previous paper and their status would be highly informative and beneficial for the field, as it would elucidate the solid underlying mechanisms.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the roles of IncRNA Malat1 in bone homeostasis which was initially believed to be non-functional for physiology. They found that both Malat1 KO and conditional KO in osteoblast lineage exhibit significant osteoporosis due to decreased osteoblast bone formation and increased osteoclast resorption. More interestingly they found that deletion of Malat1 in osteoclast lineage cells does not affect osteoclast differentiation and function. Mechanistically, they found that Malat1 acts as a co-activator of b-Catenin directly regulating osteoblast activity and indirectly regulating osteoclast activity via mediating OPG, but not RANKL expression in osteoblast and chondrocyte. Their discoveries establish a previously unrecognized paradigm model of Malat1 function in the skeletal system, providing novel mechanistic insights into how a lncRNA integrates cellular crosstalk and molecular networks to fine-tune tissue homeostasis, and remodeling.

      Strengths:

      The authors generated global and conditional KO mice in osteoblast and osteoclast lineage cells and carefully analyzed the role of Matat1 with both in vivo and in vitro systems. The conclusion of this paper is mostly well supported by data.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      More objective biological and biochemical analyses are required.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Qin and colleagues study the role of Malat1 in bone biology. This topic is interesting given the role of lncRNAs in multiple physiologic processes. A previous study (PMID 38493144) suggested a role for Malat1 in osteoclast maturation. However, the role of this lncRNA in osteoblast biology was previously not explored. Here, the authors note osteopenia with increased bone resorption in mice lacking Malat1 globally and in osteoblast lineage cells. At the mechanistic level, the authors suggest that Malat1 controls beta-catenin activity. These results advance the field regarding the role of this lncRNA in bone biology.

      Strengths:

      The manuscript is well-written and data are presented in a clear and easily understandable manner. The bone phenotype of osteoblast-specific Malat1 knockout mice is of high interest. The role of Malat1 in controlling beta-catenin activity and OPG expression is interesting and novel.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      The lack of a bone phenotype when Malat1 is deleted with LysM-Cre is of interest given the previous report suggesting a role for this lncRNA in osteoclasts. However, to interpret the findings here, the authors should investigate the deletion efficiency of Malat1 in osteoclast lineage cells in their model. The data in the fracture model in Figure 8 seems incomplete in the absence of a more complete characterization of callus histology and a thorough time course. The role of Malat1 and OPG in chondrocytes is unclear since the osteocalcin-Cre mice (which should retain normal Malat1 levels in chondrocytes) have similar bone loss as the global mutants.

      These points have been fully addressed in the point-to-point response below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      There are several suggestions for improving the manuscript, and we hope that you will review the recommendations carefully and make changes to the paper to address the concerns raised. Suggestions have been made to better characterize the osteoclast KO of Malat1 related to the Lys cre model as well as suggestions to include biochemical markers of bone turnover from your mice.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #1 (Recommendations For The Authors):

      (1) Replicate numbers in Figure 3 should be noted.

      We thank the reviewer for this point. The experiments in Fig. 3 have been replicated three times, which is now noted in the figure legend.

      (2) It is novel to identify OPG expression in chondrocytes. More discussion is expected.

      Yes, a paragraph regarding this point has been added to the Discussion section.  

      Reviewer #2 (Recommendations For The Authors):

      (1) It is better to show serum osteoblast bone formation marker and osteoclast resorption marker, such as P1NP and CTx, in both Malat1 KO and osteoblast conditional KO mice.

      We thank the reviewer for this important point. Since CTx values are often influenced by food intake, we measured serum TRAP levels, which also reflect changes in osteoclastic bone resorption. We have observed that the serum osteoblastic bone formation marker P1NP was decreased, while osteoclastic bone resorption marker TRAP was increased, in both Malat1<sup>-/-</sup> and Malat1<sup>ΔOcn</sup> mice. These changes in serum biochemical markers of bone turnover are consistent with the bone phenotype caused by Malat1 deficiency. The new data are shown in Fig.1i, Fig. 2e, and Fig.5b.    

      (2) in vitro osteoblast differentiation assay is required to further confirm Malat1 regulates osteoblast differentiation.

      We thank the reviewer for this suggestion. As recommended, we have performed in vitro osteoblast differentiation multiple times using calvarial cells, a commonly used system in the field. However, we observed big variability in the culture results across different experimental batches, whether conducted by different scientists or the same individual. This variability is likely due to differences in the purity of the cultured cells, as literature shows that the current culture system in the field contains a mixture of tissue cells, including not only osteoblasts but also other cells, such as stromal and hematopoietic lineage cells (DOI: 10.1002/jbmr.4052). We hope to test osteoblast differentiation using a purer culture system once it becomes available in the field. In contrast, our in vivo data, indicated by multiple parameters, show consistent osteoblast and bone formation phenotypes across a large number of mice. Therefore, the in vivo results in our study strongly support our conclusion regarding Malat1's role in osteoblastic bone formation.

      (3) The authors found that Matat1 regulates osteoclast activity through OPG expression not only in osteoblasts, but also in chondrocytes and concluded that chondrocyte is involved in the crosstalk with osteoclast lineage cells in marrow. This is a very novel finding. Do the authors have any in vivo data to support this point, such as deleting Malat1 in chondrocyte lineage cells with chondrocyte-specific Cre?

      We appreciate the reviewer for highlighting our novel findings and providing valuable suggestions. Given the considerable time required to generate chondrocyte-specific conditional KO mice, we plan to thoroughly investigate the crosstalk between chondrocytes and osteoclasts via Malat1 in vivo in our next project.

      Reviewer #3 (Recommendations For The Authors):

      (1) Ideally would show male and female data side by side in the main text figures

      We thank the reviewer for this suggestion. The male and female data are now displayed side by side in Fig. 1b. 

      (2) The sample size for the in vivo datasets is quite large. A power calculation should be provided to better understand how the authors decided to analyze so many mice.

      Due to staff turnover during the pandemic, the first authors and several co-authors were involved in breeding the mice and collecting and analyzing bone samples. To avoid bias in sample selection, we pooled all the samples, resulting in a highly consistent phenotype across mice. This robust approach further strengthens our conclusion. 

      (3) The candidate gene approach to look at beta-catenin is a bit random, it would be ideal to assess Malat1 binding proteins in osteoblasts in an unbiased way. Also, does Malat1 bind bcatenin in other cell types? The importance of this point is further underscored by ref 47 which indicates that Malat binds TEAD3.

      As β-catenin is a key regulator in osteoblasts, we believe that studying the interaction between β-catenin and Malat1 is not random. Instead, this approach is well-founded and based on established knowledge in the field (as discussed below). In parallel, we are investigating genome-wide Malat1-bound targets beyond β-catenin, which will be reported in future studies. 

      More detailed points have been discussed in the manuscript: 

      Given that we identified Malat1 as a critical regulator in osteoblasts, we sought to investigate the mechanisms underlying the regulation of osteoblastic bone formation by Malat1. β-catenin is a central transcriptional factor in canonical Wnt signaling pathway, and plays an important role in positively regulating osteoblast differentiation and function (28-33). Upon stimulation, most notably from canonical Wnt ligands, β-catenin is stabilized and translocates into the nucleus, where it interacts with coactivators to activate target gene transcription. Previous reports observed a link between Malat1 and β-catenin signaling pathway in cancers (34,35), but the underlying molecular mechanisms in terms of how Malat1 interacts with β-catenin and regulates its nuclear retention and transcriptional activity are unclear. 

      Ref47 tested Malat1 binding to Tead3 in osteoclasts. However, a key difference between our findings and those of Ref47 is that both our in vitro and in vivo data, using myeloid osteoclastspecific conditional Malat1 KO mice, do not support an intrinsically significant role for Malat1 in osteoclasts. 

      (4) The statement on page 6 concluding that Malat acts as a scaffold to tether β-catenin in the nucleus is not supported by data in Fig 3d demonstrating that b-catenin nucleus translocation in response to Wnt3a is similar in control and Malat-deficient cells.

      The experiment in Fig. 3d is not designed to demonstrate Malat1 and β-catenin binding, but it is essential as the result rules out the possibility that Malat1 may affect β-catenin nuclear translocation. Moreover, we have utilized two robust approaches, CHIRP and RIP, to demonstrate that Malat1 acts as a scaffold to tether β-catenin in the nucleus (Fig. 3a, b, c, Supplementary Fig. 3). 

      (5) Figure 4e: can the authors show Malat deletion efficiency in the LysM-Cre model? This is important in light of the negative data in this figure and ref 47 which claims an osteoclast intrinsic role for Malat

      We thank the reviewer for this suggestion. The deletion efficiency of Malat1 in the LysM-Cre mice is very high (>90%). This data is now presented in Fig. 4e. 

      (6) Figure 5: since the magnitude of the effects on osteoclasts at the histology level are mild, it would be nice to also look at serum markers of bone resorption (CTX)

      The magnitude of osteoclast changes at the histological level in Fig. 5 is not mild in our view, as we observe 25-30% changes with statistical significance in the osteoclast parameters of Malat1ΔOcn mice. Since CTx values are often influenced by food intake, we measured serum TRAP levels, which reflect changes in osteoclastic bone resorption. As shown in Fig.5b, serum TRAP levels are significantly elevated in Malat1<sup>ΔOcn</sup> mice compared to control mice.

      (7) Data showing chondrocytic expression of OPG is not as novel as the authors claim. Should think about growth plate versus articular sources of OPG. Growth plate chondrocytes express OPG to regulate osteoclasts in the primary spongiosa which resorb mineralized cartilage.

      In the present study, we do not focus on comparing the sources of OPG from the chondrocytes in the growth plate versus articular cartilage. The novelty of our work lies in the discovery that Malat1 links chondrocyte and osteoclast activities through the β-catenin-OPG/Jagged1 axis. This Malat1-β-catenin-OPG/Jagged1 axis represents a novel mechanism regulating the crosstalk between chondrocytes and osteoclasts. 

      (8) The relevance of the chondrocyte role of Malat is unclear since the bone phenotype in global and osteocalcin-Cre mice is similar.

      Bone mass was decreased by 20% in Malat1<sup>ΔOcn</sup> mice, while a 30% reduction was observed in global KO (Malat1<sup>-/-</sup>) mice. This difference indicates potential contributions from other cell types, such as chondrocytes, and our results in Fig. 6 further support the impact of chondrocytes in Malat1's regulation of bone mass. We plan to thoroughly investigate the crosstalk between chondrocytes and osteoclasts via Malat1 in vivo in our next project.

      (9) Fracture data in Figure 8 seems incomplete, it would be ideal to support micro CT with histology and look at multiple time points.

      We thank the reviewer for this suggestion. We have performed histological analysis of our samples, and found that Malat1 promotes bone healing in the fracture model (Fig. 8f), which is consistent with our μCT data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this revision, the authors significantly improved the manuscript. They now address some of my concerns. Specifically, they show the contribution of end-effects on spreading the inputs between dendrites. This analysis reveals greater applicability of their findings to cortical cells, with long, unbranching dendrites than other neuronal types, such as Purkinje cells in the cerebellum.

      They now explain better the interactions between calcium and voltage signals, which I believe improve the take-away message of their manuscript. They modified and added new figures that helped to provide more information about their simulations.

      However, some of my points remain valid. Figure 6 shows depolarization of ~5mV from -75. This weak depolarization would not effectively recruit nonlinear activation of NMDARs. In their paper, Branco and Hausser (2010) showed depolarizations of ~10-15mV.

      More importantly, the signature of NMDAR activation is the prolonged plateau potential and activation at more depolarized resting membrane potentials (their Figure 4). Thus, despite including NMDARs in the simulation, the authors do not model functional recruitment of these channels. Their simulation is thus equivalent to AMPA only drive, which can indeed summate somewhat nonlinearly.

      In the current study, we used short sequences of 5 inputs, since the convergence of longer sequences is extremely unlikely in the network configurations we have examined. This resulted in smaller EPSP amplitudes of ~5mV (Figure 6 - Supplement 2A, B). Longer sequences containing 9 inputs resulted in larger somatic depolarizations of ~10mV (Figure 6 - Supplement 2E, F). Although we had modified the (Branco, Clark, and Häusser 2010) model to remove the jitter in the timing of arrival of inputs and made slight modifications to the location of stimulus delivery on the dendrite, we saw similar amplitudes when we tested a 9-length sequence using (Branco, Clark, and Häusser 2010)’s published code (Figure 6 - Supplement 2I, J). In all the cases we tested (5 input sequence, 9 input sequence, 9 input sequence with (Branco, Clark, and Häusser 2010) code repository), removal of NMDA synapses lowered both the somatic EPSPs (Figure 6 - Supplement 2C,D,G,H,K,L) as well as the selectivity (measured as the difference between the EPSPs generated for inward and outward stimulus delivery) (Figure 6 Supplement 2M,N,O). Further, monitoring the voltage along the dendrite for a sequence of 5 inputs showed dendritic EPSPs in the range of 20-45 mV (Figure 6 - Supplement 2P, Q), which came down notably (10-25mV) when NMDA synapses were abolished (Figure 6 - Supplement 2R, S). Thus, even sequences containing as few as 5 inputs were capable of engaging the NMDA-mediated nonlinearity to show sequence selectivity, although the selectivity was not as strong as in the case of 9 inputs.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Figure 8, what does the scale in A represent? I assume it is voltage, but there are no units. Figure 8, C, E, G, these are unconventional units for synaptic weights, usually, these are given in nS / per input.

      We have corrected these. The scalebar in 8A represents membrane potential in mV. The units of 8C,E,G are now in nS.

      Reviewer #2 (Public Review):

      Summary:

      If synaptic input is functionally clustered on dendrites, nonlinear integration could increase the computational power of neural networks. But this requires the right synapses to be located in the right places. This paper aims to address the question of whether such synaptic arrangements could arise by chance (i.e. without special rules for axon guidance or structural plasticity), and could therefore be exploited even in randomly connected networks. This is important, particularly for the dendrites and biological computation communities, where there is a pressing need to integrate decades of work at the single-neuron level with contemporary ideas about network function.

      Using an abstract model where ensembles of neurons project randomly to a postsynaptic population, back-of-envelope calculations are presented that predict the probability of finding clustered synapses and spatiotemporal sequences. Using data-constrained parameters, the authors conclude that clustering and sequences are indeed likely to occur by chance (for large enough ensembles), but require strong dendritic nonlinearities and low background noise to be useful.

      Strengths:

      (1) The back-of-envelope reasoning presented can provide fast and valuable intuition. The authors have also made the effort to connect the model parameters with measured values. Even an approximate understanding of cluster probability can direct theory and experiments towards promising directions, or away from lost causes.

      (2) I found the general approach to be refreshingly transparent and objective. Assumptions are stated clearly about the model and statistics of different circuits. Along with some positive results, many of the computed cluster probabilities are vanishingly small, and noise is found to be quite detrimental in several cases. This is important to know, and I was happy to see the authors take a balanced look at conditions that help/hinder clustering, rather than to just focus on a particular regime that works.

      (3) This paper is also a timely reminder that synaptic clusters and sequences can exist on multiple spatial and temporal scales. The authors present results pertaining to the standard `electrical' regime (~50-100 µm, <50 ms), as well as two modes of chemical signaling (~10 µm, 100-1000 ms). The senior author is indeed an authority on the latter, and the simulations in Figure 5, extending those from Bhalla (2017), are unique in this area. In my view, the role of chemical signaling in neural computation is understudied theoretically, but research will be increasingly important as experimental technologies continue to develop.

      Weaknesses:

      (1) The paper is mostly let down by the presentation. In the current form, some patience is needed to grasp the main questions and results, and it is hard to keep track of the many abbreviations and definitions. A paper like this can be impactful, but the writing needs to be crisp, and the logic of the derivation accessible to non-experts. See, for instance, Stepanyants, Hof & Chklovskii (2002) for a relevant example.

      It would be good to see a restructure that communicates the main points clearly and concisely, perhaps leaving other observations to an optional appendix. For the interested but time-pressed reader, I recommend starting with the last paragraph of the introduction, working through the main derivation on page 7, and writing out the full expression with key parameters exposed. Next, look at Table 1 and Figure 2J to see where different circuits and mechanisms fit in this scheme. Beyond this, the sequence derivation on page 15 and biophysical simulations in Figures 5 and 6 are also highlights.

      We appreciate the reviewers' suggestions. We have tightened the flow of the introduction. We understand that the abbreviations and definitions are challenging and have therefore provided intuitions and summaries of the equations discussed in the main text.

      Clusters calculations

      Our approach is to ask how likely it is that a given set of inputs lands on a short segment of dendrite, and then scale it up to all segments on the entire dendritic length of the cell.

      Thus, the probability of occurrence of groups that receive connections from each of the M ensembles (PcFMG) is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative zone-length with respect to the total dendritic arbor (Z/L) and the number of ensembles (M).

      Sequence calculations

      Here we estimate the likelihood of the first ensemble input arriving anywhere on the dendrite, and ask how likely it is that succeeding inputs of the sequence would arrive within a set spacing.

      Thus, the probability of occurrence of sequences that receive sequential connections (PcPOSS) from each of the M ensembles is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative window size with respect to the total dendritic arbor (Δ/L) and the number of ensembles (M).

      (2) I wonder if the authors are being overly conservative at times. The result highlighted in the abstract is that 10/100000 postsynaptic neurons are expected to exhibit synaptic clustering. This seems like a very small number, especially if circuits are to rely on such a mechanism. However, this figure assumes the convergence of 3-5 distinct ensembles. Convergence of inputs from just 2 ense mbles would be much more prevalent, but still advantageous computationally. There has been excitement in the field about experiments showing the clustering of synapses encoding even a single feature.

      We agree that short clusters of two inputs would be far more likely. We focused our analysis on clusters with three of more ensembles because of the following reasons:

      (1) The signal to noise in these clusters was very poor as the likelihood of noise clusters is high.

      (2) It is difficult to trigger nonlinearities with very few synaptic inputs.

      (3) At the ensemble sizes we considered (100 for clusters, 1000 for sequences), clusters arising from just two ensembles would result in high probability of occurrence on all neurons in a network (~50% in cortex, see p_CMFG in figures below.). These dense neural representations make it difficult for downstream networks to decode (Foldiak 2003).

      However, in the presence of ensembles containing fewer neurons or when the connection probability between the layers is low, short clusters can result in sparse representations (Figure 2 - Supplement 2). Arguments 1 and 2 hold for short sequences as well.

      (3) The analysis supporting the claim that strong nonlinearities are needed for cluster/sequence detection is unconvincing. In the analysis, different synapse distributions on a single long dendrite are convolved with a sigmoid function and then the sum is taken to reflect the somatic response. In reality, dendritic nonlinearities influence the soma in a complex and dynamic manner. It may be that the abstract approach the authors use captures some of this, but it needs to be validated with simulations to be trusted (in line with previous work, e.g. Poirazi, Brannon & Mel, (2003)).

      We agree that multiple factors might affect the influence of nonlinearities on the soma. The key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. Since simulating a wide range of connectivity and activity patterns in a detailed biophysical model was computationally expensive, we analyzed the exemplar detailed models for nonlinearity separately (Figures 5, 6, and new figure 8), and then used our abstract models as a proxy for understanding population dynamics. A complete analysis of the role played by morphology, channel kinetics and the effect of branching requires an in-depth study of its own, and some of these questions have already been tackled by (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017). However, in the revision, we have implemented a single model which incorporates the range of ion-channel, synaptic and biochemical signaling nonlinearities which we discuss in the paper (Figure 8, and Figure 8 Supplement 1, 2,3). We use this to demonstrate all three forms of sequence and grouped computation we use in the study, where the only difference is in the stimulus pattern and the separation of time-scales inherent in the stimuli.

      (4) It is unclear whether some of the conclusions would hold in the presence of learning. In the signal-to-noise analysis, all synaptic strengths are assumed equal. But if synapses involved in salient clusters or sequences were potentiated, presumably detection would become easier? Similarly, if presynaptic tuning and/or timing were reorganized through learning, the conditions for synaptic arrangements to be useful could be relaxed. Answering these questions is beyond the scope of the study, but there is a caveat there nonetheless.

      We agree with the reviewer. If synapses receiving connectivity from ensembles had stronger weights, this would make detection easier. Dendritic spikes arising from clustered inputs have been implicated in local cooperative plasticity (Golding, Staff, and Spruston 2002; Losonczy, Makara, and Magee 2008). Further, plasticity related proteins synthesized at a synapse undergoing L-LTP can diffuse to neighboring weakly co-active synapses, and thereby mediate cooperative plasticity (Harvey et al. 2008; Govindarajan, Kelleher, and Tonegawa 2006; Govindarajan et al. 2011). Thus if clusters of synapses were likely to be co-active, they could further engage these local plasticity mechanisms which could potentiate them while not potentiating synapses that are activated by background activity. This would depend on the activity correlation between synapses receiving ensemble inputs within a cluster vs those activated by background activity. We have mentioned some of these ideas in a published opinion paper (Pulikkottil, Somashekar, and Bhalla 2021). In the current study, we wanted to understand whether even in the absence of specialized connection rules, interesting computations could still emerge. Thus, we focused on asking whether clustered or sequential convergence could arise even in a purely randomly connected network, with the most basic set of assumptions. We agree that an analysis of how selectivity evolves with learning would be an interesting topic for further work.

      References

      • Bhalla, Upinder S. 2017. “Synaptic Input Sequence Discrimination on Behavioral Timescales Mediated by Reaction-Diffusion Chemistry in Dendrites.” Edited by Frances K Skinner. eLife 6 (April):e25827. https://doi.org/10.7554/eLife.25827.

      • Branco, Tiago, Beverley A. Clark, and Michael Häusser. 2010. “Dendritic Discrimination of Temporal Input Sequences in Cortical Neurons.” Science (New York, N.Y.) 329 (5999): 1671–75. https://doi.org/10.1126/science.1189664.

      • Foldiak, Peter. 2003. “Sparse Coding in the Primate Cortex.” The Handbook of Brain Theory and Neural Networks. https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/2994/FoldiakSparse HBTNN2e02.pdf?sequence=1.

      • Golding, Nace L., Nathan P. Staff, and Nelson Spruston. 2002. “Dendritic Spikes as a Mechanism for Cooperative Long-Term Potentiation.” Nature 418 (6895): 326–31. https://doi.org/10.1038/nature00854.

      • Govindarajan, Arvind, Inbal Israely, Shu-Ying Huang, and Susumu Tonegawa. 2011. “The Dendritic Branch Is the Preferred Integrative Unit for Protein Synthesis-Dependent LTP.” Neuron 69 (1): 132–46. https://doi.org/10.1016/j.neuron.2010.12.008.

      • Govindarajan, Arvind, Raymond J. Kelleher, and Susumu Tonegawa. 2006. “A Clustered Plasticity Model of Long-Term Memory Engrams.” Nature Reviews Neuroscience 7 (7): 575–83. https://doi.org/10.1038/nrn1937.

      • Harvey, Christopher D., Ryohei Yasuda, Haining Zhong, and Karel Svoboda. 2008. “The Spread of Ras Activity Triggered by Activation of a Single Dendritic Spine.” Science (New York, N.Y.) 321 (5885): 136–40. https://doi.org/10.1126/science.1159675.

      • Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. 2008. “Compartmentalized Dendritic Plasticity and Input Feature Storage in Neurons.” Nature 452 (7186): 436–41. https://doi.org/10.1038/nature06725.

      • Poirazi, Panayiota, Terrence Brannon, and Bartlett W. Mel. 2003. “Pyramidal Neuron as Two-Layer Neural Network.” Neuron 37 (6): 989–99. https://doi.org/10.1016/S0896-6273(03)00149-1.

      • Pulikkottil, Vinu Varghese, Bhanu Priya Somashekar, and Upinder S. Bhalla. 2021. “Computation, Wiring, and Plasticity in Synaptic Clusters.” Current Opinion in Neurobiology, Computational Neuroscience, 70 (October):101–12. https://doi.org/10.1016/j.conb.2021.08.001.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this revision, the authors significantly improved the manuscript. They now address some of my concerns. Specifically, they show the contribution of end-effects on spreading the inputs between dendrites. This analysis reveals greater applicability of their findings to cortical cells, with long, unbranching dendrites than other neuronal types, such as Purkinje cells in the cerebellum.

      They now explain better the interactions between calcium and voltage signals, which I believe improve the take-away message of their manuscript. They modified and added new figures that helped to provide more information about their simulations.

      However, some of my points remain valid. Figure 6 shows depolarization of ~5mV from -75. This weak depolarization would not effectively recruit nonlinear activation of NMDARs. In their paper, Branco and Hausser (2010) showed depolarizations of ~10-15mV.

      More importantly, the signature of NMDAR activation is the prolonged plateau potential and activation at more depolarized resting membrane potentials (their Figure 4). Thus, despite including NMDARs in the simulation, the authors do not model functional recruitment of these channels. Their simulation is thus equivalent to AMPA only drive, which can indeed summate somewhat nonlinearly.

      In the current study, we used short sequences of 5 inputs, since the convergence of longer sequences is extremely unlikely in the network configurations we have examined. This resulted in smaller EPSP amplitudes of ~5mV (Figure 6 - Supplement 2A, B). Longer sequences containing 9 inputs resulted in larger somatic depolarizations of ~10mV (Figure 6 - Supplement 2E, F). Although we had modified the (Branco, Clark, and Häusser 2010) model to remove the jitter in the timing of arrival of inputs and made slight modifications to the location of stimulus delivery on the dendrite, we saw similar amplitudes when we tested a 9-length sequence using (Branco, Clark, and Häusser 2010)’s published code (Figure 6 - Supplement 2I, J). In all the cases we tested (5 input sequence, 9 input sequence, 9 input sequence with (Branco, Clark, and Häusser 2010) code repository), removal of NMDA synapses lowered both the somatic EPSPs (Figure 6 - Supplement 2C,D,G,H,K,L) as well as the selectivity (measured as the difference between the EPSPs generated for inward and outward stimulus delivery) (Figure 6 Supplement 2M,N,O). Further, monitoring the voltage along the dendrite for a sequence of 5 inputs showed dendritic EPSPs in the range of 20-45 mV (Figure 6 - Supplement 2P, Q), which came down notably (10-25mV) when NMDA synapses were abolished (Figure 6 - Supplement 2R, S). Thus, even sequences containing as few as 5 inputs were capable of engaging the NMDA-mediated nonlinearity to show sequence selectivity, although the selectivity was not as strong as in the case of 9 inputs.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Figure 8, what does the scale in A represent? I assume it is voltage, but there are no units. Figure 8, C, E, G, these are unconventional units for synaptic weights, usually, these are given in nS / per input.

      We have corrected these. The scalebar in 8A represents membrane potential in mV. The units of 8C,E,G are now in nS.

      Reviewer #2 (Public Review):

      Summary:

      If synaptic input is functionally clustered on dendrites, nonlinear integration could increase the computational power of neural networks. But this requires the right synapses to be located in the right places. This paper aims to address the question of whether such synaptic arrangements could arise by chance (i.e. without special rules for axon guidance or structural plasticity), and could therefore be exploited even in randomly connected networks. This is important, particularly for the dendrites and biological computation communities, where there is a pressing need to integrate decades of work at the single-neuron level with contemporary ideas about network function.

      Using an abstract model where ensembles of neurons project randomly to a postsynaptic population, back-of-envelope calculations are presented that predict the probability of finding clustered synapses and spatiotemporal sequences. Using data-constrained parameters, the authors conclude that clustering and sequences are indeed likely to occur by chance (for large enough ensembles), but require strong dendritic nonlinearities and low background noise to be useful.

      Strengths:

      (1) The back-of-envelope reasoning presented can provide fast and valuable intuition. The authors have also made the effort to connect the model parameters with measured values. Even an approximate understanding of cluster probability can direct theory and experiments towards promising directions, or away from lost causes.

      (2) I found the general approach to be refreshingly transparent and objective. Assumptions are stated clearly about the model and statistics of different circuits. Along with some positive results, many of the computed cluster probabilities are vanishingly small, and noise is found to be quite detrimental in several cases. This is important to know, and I was happy to see the authors take a balanced look at conditions that help/hinder clustering, rather than to just focus on a particular regime that works.

      (3) This paper is also a timely reminder that synaptic clusters and sequences can exist on multiple spatial and temporal scales. The authors present results pertaining to the standard `electrical' regime (~50-100 µm, <50 ms), as well as two modes of chemical signaling (~10 µm, 100-1000 ms). The senior author is indeed an authority on the latter, and the simulations in Figure 5, extending those from Bhalla (2017), are unique in this area. In my view, the role of chemical signaling in neural computation is understudied theoretically, but research will be increasingly important as experimental technologies continue to develop.

      Weaknesses:

      (1) The paper is mostly let down by the presentation. In the current form, some patience is needed to grasp the main questions and results, and it is hard to keep track of the many abbreviations and definitions. A paper like this can be impactful, but the writing needs to be crisp, and the logic of the derivation accessible to non-experts. See, for instance, Stepanyants, Hof & Chklovskii (2002) for a relevant example.

      It would be good to see a restructure that communicates the main points clearly and concisely, perhaps leaving other observations to an optional appendix. For the interested but time-pressed reader, I recommend starting with the last paragraph of the introduction, working through the main derivation on page 7, and writing out the full expression with key parameters exposed. Next, look at Table 1 and Figure 2J to see where different circuits and mechanisms fit in this scheme. Beyond this, the sequence derivation on page 15 and biophysical simulations in Figures 5 and 6 are also highlights.

      We appreciate the reviewers' suggestions. We have tightened the flow of the introduction. We understand that the abbreviations and definitions are challenging and have therefore provided intuitions and summaries of the equations discussed in the main text.

      Clusters calculations

      Our approach is to ask how likely it is that a given set of inputs lands on a short segment of dendrite, and then scale it up to all segments on the entire dendritic length of the cell.

      Thus, the probability of occurrence of groups that receive connections from each of the M ensembles (PcFMG) is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative zone-length with respect to the total dendritic arbor (Z/L) and the number of ensembles (M).

      Sequence calculations

      Here we estimate the likelihood of the first ensemble input arriving anywhere on the dendrite, and ask how likely it is that succeeding inputs of the sequence would arrive within a set spacing.

      Thus, the probability of occurrence of sequences that receive sequential connections (PcPOSS) from each of the M ensembles is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative window size with respect to the total dendritic arbor (Δ/L) and the number of ensembles (M).

      (2) I wonder if the authors are being overly conservative at times. The result highlighted in the abstract is that 10/100000 postsynaptic neurons are expected to exhibit synaptic clustering. This seems like a very small number, especially if circuits are to rely on such a mechanism. However, this figure assumes the convergence of 3-5 distinct ensembles. Convergence of inputs from just 2 ense mbles would be much more prevalent, but still advantageous computationally. There has been excitement in the field about experiments showing the clustering of synapses encoding even a single feature.

      We agree that short clusters of two inputs would be far more likely. We focused our analysis on clusters with three of more ensembles because of the following reasons:

      (1) The signal to noise in these clusters was very poor as the likelihood of noise clusters is high.

      (2) It is difficult to trigger nonlinearities with very few synaptic inputs.

      (3) At the ensemble sizes we considered (100 for clusters, 1000 for sequences), clusters arising from just two ensembles would result in high probability of occurrence on all neurons in a network (~50% in cortex, see p_CMFG in figures below.). These dense neural representations make it difficult for downstream networks to decode (Foldiak 2003).

      However, in the presence of ensembles containing fewer neurons or when the connection probability between the layers is low, short clusters can result in sparse representations (Figure 2 - Supplement 2). Arguments 1 and 2 hold for short sequences as well.

      (3) The analysis supporting the claim that strong nonlinearities are needed for cluster/sequence detection is unconvincing. In the analysis, different synapse distributions on a single long dendrite are convolved with a sigmoid function and then the sum is taken to reflect the somatic response. In reality, dendritic nonlinearities influence the soma in a complex and dynamic manner. It may be that the abstract approach the authors use captures some of this, but it needs to be validated with simulations to be trusted (in line with previous work, e.g. Poirazi, Brannon & Mel, (2003)).

      We agree that multiple factors might affect the influence of nonlinearities on the soma. The key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. Since simulating a wide range of connectivity and activity patterns in a detailed biophysical model was computationally expensive, we analyzed the exemplar detailed models for nonlinearity separately (Figures 5, 6, and new figure 8), and then used our abstract models as a proxy for understanding population dynamics. A complete analysis of the role played by morphology, channel kinetics and the effect of branching requires an in-depth study of its own, and some of these questions have already been tackled by (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017). However, in the revision, we have implemented a single model which incorporates the range of ion-channel, synaptic and biochemical signaling nonlinearities which we discuss in the paper (Figure 8, and Figure 8 Supplement 1, 2,3). We use this to demonstrate all three forms of sequence and grouped computation we use in the study, where the only difference is in the stimulus pattern and the separation of time-scales inherent in the stimuli.

      (4) It is unclear whether some of the conclusions would hold in the presence of learning. In the signal-to-noise analysis, all synaptic strengths are assumed equal. But if synapses involved in salient clusters or sequences were potentiated, presumably detection would become easier? Similarly, if presynaptic tuning and/or timing were reorganized through learning, the conditions for synaptic arrangements to be useful could be relaxed. Answering these questions is beyond the scope of the study, but there is a caveat there nonetheless.

      We agree with the reviewer. If synapses receiving connectivity from ensembles had stronger weights, this would make detection easier. Dendritic spikes arising from clustered inputs have been implicated in local cooperative plasticity (Golding, Staff, and Spruston 2002; Losonczy, Makara, and Magee 2008). Further, plasticity related proteins synthesized at a synapse undergoing L-LTP can diffuse to neighboring weakly co-active synapses, and thereby mediate cooperative plasticity (Harvey et al. 2008; Govindarajan, Kelleher, and Tonegawa 2006; Govindarajan et al. 2011). Thus if clusters of synapses were likely to be co-active, they could further engage these local plasticity mechanisms which could potentiate them while not potentiating synapses that are activated by background activity. This would depend on the activity correlation between synapses receiving ensemble inputs within a cluster vs those activated by background activity. We have mentioned some of these ideas in a published opinion paper (Pulikkottil, Somashekar, and Bhalla 2021). In the current study, we wanted to understand whether even in the absence of specialized connection rules, interesting computations could still emerge. Thus, we focused on asking whether clustered or sequential convergence could arise even in a purely randomly connected network, with the most basic set of assumptions. We agree that an analysis of how selectivity evolves with learning would be an interesting topic for further work.

      References

      Bhalla, Upinder S. 2017. “Synaptic Input Sequence Discrimination on Behavioral Timescales Mediated by Reaction-Diffusion Chemistry in Dendrites.” Edited by Frances K Skinner. eLife 6 (April):e25827. https://doi.org/10.7554/eLife.25827.

      Branco, Tiago, Beverley A. Clark, and Michael Häusser. 2010. “Dendritic Discrimination of Temporal Input Sequences in Cortical Neurons.” Science (New York, N.Y.) 329 (5999): 1671–75. https://doi.org/10.1126/science.1189664.

      Foldiak, Peter. 2003. “Sparse Coding in the Primate Cortex.” The Handbook of Brain Theory and Neural Networks. https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/2994/FoldiakSparse HBTNN2e02.pdf?sequence=1.

      Golding, Nace L., Nathan P. Staff, and Nelson Spruston. 2002. “Dendritic Spikes as a Mechanism for Cooperative Long-Term Potentiation.” Nature 418 (6895): 326–31. https://doi.org/10.1038/nature00854.

      Govindarajan, Arvind, Inbal Israely, Shu-Ying Huang, and Susumu Tonegawa. 2011. “The Dendritic Branch Is the Preferred Integrative Unit for Protein Synthesis-Dependent LTP.” Neuron 69 (1): 132–46. https://doi.org/10.1016/j.neuron.2010.12.008.

      Govindarajan, Arvind, Raymond J. Kelleher, and Susumu Tonegawa. 2006. “A Clustered Plasticity Model of Long-Term Memory Engrams.” Nature Reviews Neuroscience 7 (7): 575–83. https://doi.org/10.1038/nrn1937.

      Harvey, Christopher D., Ryohei Yasuda, Haining Zhong, and Karel Svoboda. 2008. “The Spread of Ras Activity Triggered by Activation of a Single Dendritic Spine.” Science (New York, N.Y.) 321 (5885): 136–40. https://doi.org/10.1126/science.1159675.

      Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. 2008. “Compartmentalized Dendritic Plasticity and Input Feature Storage in Neurons.” Nature 452 (7186): 436–41. https://doi.org/10.1038/nature06725.

      Poirazi, Panayiota, Terrence Brannon, and Bartlett W. Mel. 2003. “Pyramidal Neuron as Two-Layer Neural Network.” Neuron 37 (6): 989–99. https://doi.org/10.1016/S0896-6273(03)00149-1.

      Pulikkottil, Vinu Varghese, Bhanu Priya Somashekar, and Upinder S. Bhalla. 2021. “Computation, Wiring, and Plasticity in Synaptic Clusters.” Current Opinion in Neurobiology, Computational Neuroscience, 70 (October):101–12. https://doi.org/10.1016/j.conb.2021.08.001.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the current manuscript, the authors use theoretical and analytical tools to examine the possibility of neural projections to engage ensembles of synaptic clusters in active dendrites. The analysis is divided into multiple models that differ in the connectivity parameters, speed of interactions, and identity of the signal (electric vs. second messenger). They first show that random connectivity almost ensures the representation of presynaptic ensembles. As expected, this convergence is much more likely for small group sizes and slow processes, such as calcium dynamics. Conversely, fast signals (spikes and postsynaptic potentials) and large groups are much less likely to recruit spatially clustered inputs. Dendritic nonlinearity in the postsynaptic cells was found to play a highly important role in distinguishing these clustered activation patterns, both when activated simultaneously and in sequence. The authors tackled the difficult issue of noise, showing a beneficiary effect when noise 'happens' to fill in gaps in a sequential pattern but degraded performance at higher background activity levels. Last, the authors simulated selectivity to chemical and electrical signals. While they find that longer sequences are less perturbed by noise, in more realistic activation conditions, the signals are not well resolved in the soma.

      While I think the premise of the manuscript is worth exploring, I have a number of reservations regarding the results.

      (1) In the analysis, the authors made a simplifying assumption that the chemical and electrical processes are independent. However, this is not the case; excitatory inputs to spines often trigger depolarization combined with pronounced calcium influx; this mixed signaling could have dramatic implications on the analysis, particularly if the dendrites are nonlinear (see below)

      We thank the reviewer for pointing out that we were not entirely clear about the strong basis upon which we had built our analyses of nonlinearity. In the previous version we had relied on published work, notably (Bhalla 2017), which does include these nonlinearities. However, we agree it is preferable to unambiguously demonstrate all the reported selectivity properties in a single model with all the nonlinearities discussed. We have now done so. This is now reported in the paper:

      “A single model exhibits multiple forms of nonlinear dendritic selectivity

      We implemented all three forms of selectivity described above, in a single model which included six voltage and calcium-gated ion channels, NMDA, AMPA and GABA receptors, and chemical signaling processes in spines and dendrites. The goal of this was three fold: To show how these nonlinear operations emerge in a mechanistically detailed model, to show that they can coexist, and to show that they are separated in time-scales. We implemented a Y-branched neuron model with additional electrical compartments for the dendritic spines (Methods). This model was closely based on a published detailed chemical-electrical model (Bhalla 2017). We stimulated this model with synaptic input corresponding to the three kinds of spatiotemporal patterns described in figures Figure 8 - Supplement 1 (sequential synaptic activity triggering electrical sequence selectivity), Figure 8 - Supplement 2 (spatially grouped synaptic stimuli leading to local Ca4_CaM activation), and Figure 8 - Supplement 3 (sequential bursts of synaptic activity triggering chemical sequence selectivity). We found that each of these mechanisms show nonlinear selectivity with respect to both synaptic spacing and synaptic weights. Further, these forms of selectivity coexist in the composite model (Figure 8 Supplements 1, 2, 3), separated by the time-scales of the stimulus patterns (~ 100 ms, ~ 1s and ~10s respectively). Thus mixed signaling in active nonlinear dendrites yields selectivity of the same form as we explored in simpler individual models. A more complete analysis of the effect of morphology, branching and channel distributions deserves a separate in-depth analysis, and is outside the scope of the current study.”

      (2) Sequence detection in active dendrites is often simplified to investigating activation in a part of or the entirety of individual branches. However, the authors did not do that for most of their analysis. Instead, they treat the entire dendritic tree as one long branch and count how many inputs form clusters. I fail to see why simplification is required and suspect it can lead to wrong results. For example, two inputs that are mapped to different dendrites in the 'original' morphology but then happen to fall next to each other when the branches are staggered to form the long dendrites would be counted as neighbors.

      We have added the below section within the main text in the section titled “Grouped Convergence of Inputs” to address the effect of branching.

      “End-effects limit convergence zones for highly branched neurons

      Neurons exhibit considerable diversity with respect to their morphologies. How synapses extending across dendritic branch points interact in the context of a synaptic cluster/group, is a topic that needs detailed examination via experimental and modeling approaches. However for the sake of analysis, we present calculations under the assumption that selectivity for grouped inputs might be degraded across branch points.

      Zones beginning close to a branch point might get interrupted. Consider a neuron with B branches. The length of the typical branch would be L/B. As a conservative estimate if we exclude a region of length Z for every branch, the expected number of zones that begin too close to a branch point is

                                                                          [Equation 3]

      For typical pyramidal neurons B~50, so Eend ~ 0.05 for values of Z of ~10 µm. Thus pyramidal neurons will not be much affected by branching effects, Profusely branching neurons like Purkinje cells have B~900 for a total L of ~7800 µm, (McConnell and Berry, 1978), hence Eend ~1 for values of Z of ~10 µm. Thus almost all groups in Purkinje neurons would run into a branch point or terminal. For the case of electrical groups, this estimate would be scaled by a factor of 5 if we consider a zone length of 50 µm. However, it is important to note that these are very conservative estimates, as for clusters of 4-5 inputs, the number of synapses available within a zone are far greater (~100 synapses within 50 µm).”

      (3) The simulations were poorly executed. Figures 5 and 6 show examples but no summary statistics.

      We have included the summary statistics in Figure 5F and Figure 6E. The statistics for both these panels were generated by simulating multiple spatiotemporal combinations of ectopic input in the presence of different stimulus patterns for each sequence length.

      The authors emphasize the importance of nonlinear dendritic interactions, but they do not include them in their analysis of the ectopic signals! I find it to be wholly expected that the effects of dendritic ensembles are not pronounced when the dendrites are linear.

      We would like to clarify that both Figures 5 and 6 already included nonlinearities. In Figure 5, the chemical mechanism involving the bistable switch motif is strongly selective for ordered inputs in a nonlinear manner. A separate panel highlighting this (Panel C) has now been included in Figure 5. This result had been previously shown in Figure 3I of (Bhalla 2017). We have reproduced it in Figure 5C.

      The published electrical model used in Figure 6 also has a nonlinearity which predominantly stems from the interaction of the impedance gradient along the dendrite with the voltage dependence of NMDARs. Check Figure 4C,D of (Branco, Clark, and Häusser 2010).

      To provide a comprehensive analysis of dendritic integration, the authors could simulate more realistic synaptic conductances and voltage-gated channels. They would find much more complicated interactions between inputs on a single site, a sliding temporal and spatial window of nonlinear integration that depends on dendritic morphology, active and passive parameters, and synaptic properties. At different activation levels, the rules of synaptic integration shift to cooperativity between different dendrites and cellular compartments, further complicated by nonlinear interactions between somatic spikes and dendritic events.

      We would like to clarify two points. First, the key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. In this revision we provide simulations to show the mechanistic basis for the nonlinearities, and then abstracted these out in order to scale the analysis to networks. These nonlinearities were taken as a given, though we elaborated previous work slightly in order to address the question of ectopic inputs. Second, in our original submission we relied on published work for the estimates of dendritic nonlinearities. Previous work from (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017) have already carried out highly detailed realistic simulations, and in some cases including chemical and electrical nonlinearities as the reviewer mentions (Bhalla 2017). Hence we did not feel that this needed to be redone.

      In this resubmission we have addressed the above and two additional concerns, namely whether the different forms of selectivity can coexist in a single model including all these nonlinearities, and whether there is separation of time-scales. The answer is yes to both. The outcome of this is presented in Figure 8 and the associated supplementary figures, and all simulation details are provided on the github repository associated with this paper. A more complete analysis of interaction of multiple nonlinearities in a detailed model is material for further study.

      While it is tempting to extend back-of-the-napkin calculations of how many inputs can recruit nonlinear integration in active dendrites, the biological implementation is very different from this hypothetical. It is important to consider these questions, but I am not convinced that this manuscript adequately addressed the questions it set out to probe, nor does it provide information that was unknown beforehand.

      We developed our analysis systematically, and perhaps the reviewer refers to the first few calculations as back-of-the-napkin. However, the derivation rapidly becomes more complex when we factor in combinatorics and the effect of noise. This derivation is in the supplementary material. Furthermore, the exact form of the combinatorial and noise equations was non-trivial to derive and we worked closely with the connectivity simulations (Figures 2 and 4) to obtain equations which scale across a large parameter space by sampling connectivity for over 100000 neurons and activity over 100 trials for each of these neurons for each network configuration we have tested.

      the biological implementation is very different from this hypothetical.

      We do not quite understand in what respect the reviewer feels that this calculation is very different from the biological implementation. The calculation is about projection patterns. In the discussion we consider at length how our findings of selectivity from random projections may be an effective starting point for more elaborate biological connection rules. We have added the following sentence:

      “We present a first-order analysis of the simplest kind of connectivity rule (random), upon which more elaborate rules such as spatial gradients and activity-dependent wiring may be developed.”

      In case the reviewer was referring to the biological implementation of nonlinear integration, we treat the nonlinear integration in the dendrites as a separate set of simulations, most of which are closely based on published work (Bhalla 2017). We use these in the later sections of the paper to estimate selectivity terms, which inform our final analysis.

      In the revision we have worked to clarify this progression of the analysis. As indicated above, we have also made a composite model of all of the nonlinear dendritic mechanisms, chemical and electrical, which underlie our analysis.

      nor does it provide information that was unknown beforehand.

      We conducted a broad literature survey and to the best of our knowledge these calculations and findings have not been obtained previously. If the reviewer has some specific examples in mind we would be pleased to refer to it.

      Reviewer #2 (Public Review):

      Summary:

      If synaptic input is functionally clustered on dendrites, nonlinear integration could increase the computational power of neural networks. But this requires the right synapses to be located in the right places. This paper aims to address the question of whether such synaptic arrangements could arise by chance (i.e. without special rules for axon guidance or structural plasticity), and could therefore be exploited even in randomly connected networks. This is important, particularly for the dendrites and biological computation communities, where there is a pressing need to integrate decades of work at the single-neuron level with contemporary ideas about network function.

      Using an abstract model where ensembles of neurons project randomly to a postsynaptic population, back-of-envelope calculations are presented that predict the probability of finding clustered synapses and spatiotemporal sequences. Using data-constrained parameters, the authors conclude that clustering and sequences are indeed likely to occur by chance (for large enough ensembles), but require strong dendritic nonlinearities and low background noise to be useful.

      Strengths:

      (1) The back-of-envelope reasoning presented can provide fast and valuable intuition. The authors have also made the effort to connect the model parameters with measured values. Even an approximate understanding of cluster probability can direct theory and experiments towards promising directions, or away from lost causes.

      (2) I found the general approach to be refreshingly transparent and objective. Assumptions are stated clearly about the model and statistics of different circuits. Along with some positive results, many of the computed cluster probabilities are vanishingly small, and noise is found to be quite detrimental in several cases. This is important to know, and I was happy to see the authors take a balanced look at conditions that help/hinder clustering, rather than to just focus on a particular regime that works.

      (3) This paper is also a timely reminder that synaptic clusters and sequences can exist on multiple spatial and temporal scales. The authors present results pertaining to the standard `electrical' regime (~50-100 µm, <50 ms), as well as two modes of chemical signaling (~10 µm, 100-1000 ms). The senior author is indeed an authority on the latter, and the simulations in Figure 5, extending those from Bhalla (2017), are unique in this area. In my view, the role of chemical signaling in neural computation is understudied theoretically, but research will be increasingly important as experimental technologies continue to develop.

      Weaknesses:

      (1) The paper is mostly let down by the presentation. In the current form, some patience is needed to grasp the main questions and results, and it is hard to keep track of the many abbreviations and definitions. A paper like this can be impactful, but the writing needs to be crisp, and the logic of the derivation accessible to non-experts. See, for instance, Stepanyants, Hof & Chklovskii (2002) for a relevant example.

      It would be good to see a restructure that communicates the main points clearly and concisely, perhaps leaving other observations to an optional appendix. For the interested but time-pressed reader, I recommend starting with the last paragraph of the introduction, working through the main derivation on page 7, and writing out the full expression with key parameters exposed. Next, look at Table 1 and Figure 2J to see where different circuits and mechanisms fit in this scheme. Beyond this, the sequence derivation on page 15 and biophysical simulations in Figures 5 and 6 are also highlights.

      We appreciate the reviewers' suggestions. We have tightened the flow of the introduction. We understand that the abbreviations and definitions are challenging and have therefore provided intuitions and summaries of the equations discussed in the main text.

      Clusters calculations

      “Our approach is to ask how likely it is that a given set of inputs lands on a short segment of dendrite, and then scale it up to all segments on the entire dendritic length of the cell.

      Thus, the probability of occurrence of groups that receive connections from each of the M ensembles (PcFMG) is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative zone-length with respect to the total dendritic arbor (Z/L) and the number of ensembles (M).”

      Sequence calculations

      “Here we estimate the likelihood of the first ensemble input arriving anywhere on the dendrite, and ask how likely it is that succeeding inputs of the sequence would arrive within a set spacing.

      Thus, the probability of occurrence of sequences that receive sequential connections (PcPOSS) from each of the M ensembles is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative window size with respect to the total dendritic arbor (Δ/L) and the number of ensembles (M).”

      (2) I wonder if the authors are being overly conservative at times. The result highlighted in the abstract is that 10/100000 postsynaptic neurons are expected to exhibit synaptic clustering. This seems like a very small number, especially if circuits are to rely on such a mechanism. However, this figure assumes the convergence of 3-5 distinct ensembles. Convergence of inputs from just 2 ense mbles would be much more prevalent, but still advantageous computationally. There has been excitement in the field about experiments showing the clustering of synapses encoding even a single feature.

      We agree that short clusters of two inputs would be far more likely. We focused our analysis on clusters with three of more ensembles because of the following reasons:

      (1) The signal to noise in these clusters was very poor as the likelihood of noise clusters is high.

      (2) It is difficult to trigger nonlinearities with very few synaptic inputs.

      (3) At the ensemble sizes we considered (100 for clusters, 1000 for sequences), clusters arising from just two ensembles would result in high probability of occurrence on all neurons in a network (~50% in cortex, see p_CMFG in figures below.). These dense neural representations make it difficult for downstream networks to decode (Foldiak 2003).

      However, in the presence of ensembles containing fewer neurons or when the connection probability between the layers is low, short clusters can result in sparse representations (Figure 2 - Supplement 2). Arguments 1 and 2 hold for short sequences as well.

      (3) The analysis supporting the claim that strong nonlinearities are needed for cluster/sequence detection is unconvincing. In the analysis, different synapse distributions on a single long dendrite are convolved with a sigmoid function and then the sum is taken to reflect the somatic response. In reality, dendritic nonlinearities influence the soma in a complex and dynamic manner. It may be that the abstract approach the authors use captures some of this, but it needs to be validated with simulations to be trusted (in line with previous work, e.g. Poirazi, Brannon & Mel, (2003)).

      We agree that multiple factors might affect the influence of nonlinearities on the soma. The key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. Since simulating a wide range of connectivity and activity patterns in a detailed biophysical model was computationally expensive, we analyzed the exemplar detailed models for nonlinearity separately (Figures 5, 6, and new figure 8), and then used our abstract models as a proxy for understanding population dynamics. A complete analysis of the role played by morphology, channel kinetics and the effect of branching requires an in-depth study of its own, and some of these questions have already been tackled by (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017). However, in the revision, we have implemented a single model which incorporates the range of ion-channel, synaptic and biochemical signaling nonlinearities which we discuss in the paper (Figure 8, and Figure 8 Supplement 1, 2,3). We use this to demonstrate all three forms of sequence and grouped computation we use in the study, where the only difference is in the stimulus pattern and the separation of time-scales inherent in the stimuli.

      (4) It is unclear whether some of the conclusions would hold in the presence of learning. In the signal-to-noise analysis, all synaptic strengths are assumed equal. But if synapses involved in salient clusters or sequences were potentiated, presumably detection would become easier? Similarly, if presynaptic tuning and/or timing were reorganized through learning, the conditions for synaptic arrangements to be useful could be relaxed. Answering these questions is beyond the scope of the study, but there is a caveat there nonetheless.

      We agree with the reviewer. If synapses receiving connectivity from ensembles had stronger weights, this would make detection easier. Dendritic spikes arising from clustered inputs have been implicated in local cooperative plasticity (Golding, Staff, and Spruston 2002; Losonczy, Makara, and Magee 2008). Further, plasticity related proteins synthesized at a synapse undergoing L-LTP can diffuse to neighboring weakly co-active synapses, and thereby mediate cooperative plasticity (Harvey et al. 2008; Govindarajan, Kelleher, and Tonegawa 2006; Govindarajan et al. 2011). Thus if clusters of synapses were likely to be co-active, they could further engage these local plasticity mechanisms which could potentiate them while not potentiating synapses that are activated by background activity. This would depend on the activity correlation between synapses receiving ensemble inputs within a cluster vs those activated by background activity. We have mentioned some of these ideas in a published opinion paper (Pulikkottil, Somashekar, and Bhalla 2021). In the current study, we wanted to understand whether even in the absence of specialized connection rules, interesting computations could still emerge. Thus, we focused on asking whether clustered or sequential convergence could arise even in a purely randomly connected network, with the most basic set of assumptions. We agree that an analysis of how selectivity evolves with learning would be an interesting topic for further work.

      References

      Bhalla, Upinder S. 2017. “Synaptic Input Sequence Discrimination on Behavioral Timescales Mediated by Reaction-Diffusion Chemistry in Dendrites.” Edited by Frances K Skinner. eLife 6 (April):e25827. https://doi.org/10.7554/eLife.25827.

      Branco, Tiago, Beverley A. Clark, and Michael Häusser. 2010. “Dendritic Discrimination of Temporal Input Sequences in Cortical Neurons.” Science (New York, N.Y.) 329 (5999): 1671–75. https://doi.org/10.1126/science.1189664.

      Foldiak, Peter. 2003. “Sparse Coding in the Primate Cortex.” The Handbook of Brain Theory and Neural Networks. https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/2994/FoldiakSparse HBTNN2e02.pdf?sequence=1.

      Golding, Nace L., Nathan P. Staff, and Nelson Spruston. 2002. “Dendritic Spikes as a Mechanism for Cooperative Long-Term Potentiation.” Nature 418 (6895): 326–31. https://doi.org/10.1038/nature00854.

      Govindarajan, Arvind, Inbal Israely, Shu-Ying Huang, and Susumu Tonegawa. 2011. “The Dendritic Branch Is the Preferred Integrative Unit for Protein Synthesis-Dependent LTP.” Neuron 69 (1): 132–46. https://doi.org/10.1016/j.neuron.2010.12.008.

      Govindarajan, Arvind, Raymond J. Kelleher, and Susumu Tonegawa. 2006. “A Clustered Plasticity Model of Long-Term Memory Engrams.” Nature Reviews Neuroscience 7 (7): 575–83. https://doi.org/10.1038/nrn1937.

      Harvey, Christopher D., Ryohei Yasuda, Haining Zhong, and Karel Svoboda. 2008. “The Spread of Ras Activity Triggered by Activation of a Single Dendritic Spine.” Science (New York, N.Y.) 321 (5885): 136–40. https://doi.org/10.1126/science.1159675.

      Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. 2008. “Compartmentalized Dendritic Plasticity and Input Feature Storage in Neurons.” Nature 452 (7186): 436–41. https://doi.org/10.1038/nature06725.

      Poirazi, Panayiota, Terrence Brannon, and Bartlett W. Mel. 2003. “Pyramidal Neuron as Two-Layer Neural Network.” Neuron 37 (6): 989–99. https://doi.org/10.1016/S0896-6273(03)00149-1.

      Pulikkottil, Vinu Varghese, Bhanu Priya Somashekar, and Upinder S. Bhalla.     2021.

      “Computation, Wiring, and Plasticity in Synaptic Clusters.” Current Opinion in Neurobiology, Computational Neuroscience, 70 (October):101–12. https://doi.org/10.1016/j.conb.2021.08.001.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #2 (Public Review): 

      Regarding reviewer #2 public review, we update here our answers to this public review with new analysis and modification done in the manuscript. 

      This manuscript is missing a direct phenotypic comparison of control cells to complement that of cells expressing RhoGEF2-DHPH at "low levels" (the cells that would respond to optogenetic stimulation by retracting); and cells expressing RhoGEF2-DHPH at "high levels" (the cells that would respond to optogenetic stimulation by protruding). In other words, the authors should examine cell area, the distribution of actin and myosin, etc in all three groups of cells (akin to the time zero data from figures 3 and 5, with a negative control). For example, does the basal expression meaningfully affect the PRG low-expressing cells before activation e.g. ectopic stress fibers? This need not be an optogenetic experiment, the authors could express RhoGEF2DHPH without SspB (as in Fig 4G). 

      Updated answer: We thank reviewer #2 for this suggestion. PRG-DHPH overexpression is known to affect the phenotype of the cell as shown in Valon et al., 2017. In our experiments, we could not identify any evidence of a particular phenotype before optogenetic activation apart from the area and spontaneous membrane speed that were already reported in our manuscript (Fig 2E and SuppFig 2). Regarding the distribution of actin and myosin, we did not observe an obvious pattern that will be predictive of the protruding/retracting phenotype. Trying to be more quantitative, we have classified (by eye, without knowing the expression level of PRG nor the future phenotype) the presence of stress fibers, the amount of cortical actin, the strength of focal adhesions, and the circularity of cells. As shown below, when these classes are binned by levels of expression of PRG (two levels below the threshold and two above) there is no clear determinant. Thus, we concluded that the main driver of the phenotype was the PRG basal expression rather than any particularity of the actin cytoskeleton/cell shape.

      Author response image 1.

      Author response image 2.

      Relatedly, the authors seem to assume ("recruitment of the same DH-PH domain of PRG at the membrane, in the same cell line, which means in the same biochemical environment." supplement) that the only difference between the high and low expressors are the level of expression. Given the chronic overexpression and the fact that the capacity for this phenotypic shift is not recruitmentdependent, this is not necessarily a safe assumption. The expression of this GEF could well induce e.g. gene expression changes. 

      Updated answer: We agree with reviewer #2 that there could be changes in gene expression. In the next point of this supplementary note, we had specified it, by saying « that overexpression has an influence on cell state, defined as protein basal activity or concentration before activation. »  We are sorry if it was not clear, and we changed this sentence in the revised manuscript (in red in the supp note). 

      One of the interests of the model is that it does not require any change in absolute concentrations, beside the GEF. The model is thought to be minimal and fits well and explains the data with very few parameters. We do not show that there is no change in concentration, but we show that it is not required to invoke it. We revised a sentence in the new version of the manuscript to include this point.

      Additional answer: During the revision process, we have been looking for an experimental demonstration of the independence of the phenotypic switch to any change in global gene expression pattern due to the chronic overexpression of PRG. Our idea was to be in a condition of high PRG overexpression such that cells protrude upon optogenetic activation, and then acutely deplete PRG to see if cells where then retracting. To deplete PRG in a timescale that prevent any change of gene expression, we considered the recently developed CATCHFIRE (PMID: 37640938) chemical dimerizer. We designed an experiment in which the PRG DH-PH domain was expressed in fusion with a FIRE-tag and co-expressing the FIRE-mate fused to TOM20 together with the optoPRG tool. Upon incubation with the MATCH small molecule, we should be able to recruit the overexpressed PRG to the mitochondria within minutes, hereby preventing it to form a complex with active RhoA in the vicinity of the plasma membrane. Unfortunately, despite of numerous trials we never achieved the required conditions: we could not have cells with high enough expression of PRGFIRE-tag (for protrusive response) and low enough expression of optoPRG (for retraction upon PRGFIRE-tag depletion). We still think this would be a nice experiment to perform, but it will require the establishment of a stable cell line with finely tuned expression levels of the CATCHFIRE system that goes beyond the timeline of our present work.      

      Concerning the overall model summarizing the authors' observations, they "hypothesized that the activity of RhoA was in competition with the activity of Cdc42"; "At low concentration of the GEF, both RhoA and Cdc42 are activated by optogenetic recruitment of optoPRG, but RhoA takes over. At high GEF concentration, recruitment of optoPRG lead to both activation of Cdc42 and inhibition of already present activated RhoA, which pushes the balance towards Cdc42."

      These descriptions are not precise. What is the nature of the competition between RhoA and Cdc42? Is this competition for activation by the GEFs? Is it a competition between the phenotypic output resulting from the effectors of the GEFs? Is it competition from the optogenetic probe and Rho effectors and the Rho biosensors? In all likelihood, all of these effects are involved, but the authors should more precisely explain the underlying nature of this phenotypic switch. Some of these points are clarified in the supplement, but should also be explicit in the main text. 

      Updated answer: We consider the competition between RhoA and Cdc42 as a competition between retraction due to the protein network triggered by RhoA (through ROCK-Myosin and mDia-bundled actin) and the protrusion triggered by Cdc42 (through PAK-Rac-ARP2/3-branched Actin). We made this point explicit in the main text.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Major 

      - why this is only possible for such few cells. Can the authors comment on this in the discussion? Does the model provide any hints? 

      As said in our answer to the public comment or reviewer #1, we think that the low number of cells being able to switch can be explained by two different reasons: 

      (1) First, we were looking for clear inversions of the phenotype, where we could see clear ruffles in the case of the protrusion, and clear retractions in the other case. Thus, we discarded cells that would show in-between phenotypes, because we had no quantitative parameter to compare how protrusive or retractile they were. This reduced the number of switching cells 

      (2) Second, we had a limitation due to the dynamic of the optogenetic dimer used here. Indeed, the control of the frequency was limited by the dynamic of unbinding of the optogenetic dimer. This dynamic of recruitment (~20s) is comparable to the dynamics of the deactivation of RhoA and Cdc42. Thus, the differences in frequency are smoothed and we could not vary enough the frequency to increase the number of switches. Thanks to the model, we can predict that increasing the unbinding rate of the optogenetic tool (shorter dimer lifetime) should allow us to increase the number of switching cells. 

      We have added a sentence in the discussion to make this second point explicit.

      - I would encourage the authors to discuss this molecular signaling switch in the context of general design principles of switches. How generalizable is this network/mechanism? Is it exclusive to activating signaling proteins or would it work with inhibiting mechanisms? Is the competition for the same binding site between activators and effectors a common mechanism in other switches? 

      The most common design principle for molecular switches is the bistable switch that relies on a nonlinear activation (for example through cooperativity) with a linear deactivation. Such a design allows the switch between low and high levels. In our case, there is no need for a non-linearity since the core mechanism is a competition for the same binding site on active RhoA of the activator and the effectors. Thus, the design principle would be closer to the notion of a minimal “paradoxical component” (PMID: 23352242) that both activate and limit signal propagation, which in our case can be thought as a self-limiting mechanism to prevent uncontrolled RhoA activation by the positive feedback. Yet, as we show in our work, this core mechanism is not enough for the phenotypic switch to happen since the dual activation of RhoA and Cdc42 is ultimately required for the protrusion phenotype to take over the retracting one. Given the particularity of the switch we observed here, we do not feel comfortable to speculate on any general design principles in the main text, but we thank reviewer #1 for his/her suggestion.

      - Supplementary figures - there is a discrepancy between the figures called in the text and the supplementary files, which only include SF1-4. 

      We apologize for this error and we made the correction. 

      - In the text, the authors use Supp Figure 7 to show that the phenotype could not be switched by varying the fold increase of recruitment through changing the intensity/duration of the light pulse. Aside from providing the figure, could you give an explanation or speculation of why? Does the model give any prediction as to why this could be difficult to achieve experimentally (is the range of experimentally feasible fold change of 1.1-3 too small? Also, could you clarify why the range is different than the 3 to 10-fold mentioned at the beginning of the results section? 

      We thank the reviewer for this question, and this difference between frequency and intensity can be indeed understood in a simple manner through the model. 

      All the reactions in our model were modeled as linear reactions. Thus, at any timepoint, changing the intensity of the pulse will only change proportionally the amount of the different components (amount of active RhoA, amount of sequestered RhoA, and amount of active Cdc42). This explains why we cannot change the balance between RhoA activity and Cdc42 activity only through the pulse strength. We observed the same experimentally: when we changed the intensity of the pulses, the phenotype would be smaller/stronger, but would never switch, supporting our hypothesis on the linearity of all biochemical reactions. 

      On the contrary, changing the frequency has an effect, for a simple reason: the dynamics of RhoA and Cdc42 activation are not the same as the dynamics of inhibition of RhoA by the PH domain (see

      Figure 4). The inhibition of RhoA by the PH is almost instantaneous while the activation of RhoGTPases has a delay (sets by the deactivation parameter k_2). Intuitively, increasing the frequency will lead to sustained inhibition of RhoA, promoting the protrusion phenotype. Decreasing the frequency – with a stronger pulse to keep the same amount of recruited PRG – restricts this inhibition of RhoA to the first seconds following the activation. The delayed activation of RhoA will then take over. 

      We added two sentences in the manuscript to explain in greater details the difference between intensity and frequency.  

      Regarding the difference between the 1.3-3 fold and the 3 to 10 fold, the explanation is the following: the 3 to 10 fold referred to the cumulative amount of proteins being recruited after multiple activations (steady state amount reached after 5 minutes with one activation every 30s); while the 1.3-3 fold is what can be obtained after only one single pulse of activation.  

      - The transient expression achieves a large range of concentration levels which is a strength in this case. To solve the experimental difficulties associated with this, i.e. finding transfected cells at low cell density, the authors developed a software solution (Cell finder). Since this approach will be of interest for a wide range of applications, I think it would deserve a mention in the discussion part. 

      We thank the reviewer for his/her interest in this small software solution.

      We developed the description of the tool in the Method section. The Cell finder is also available with comments on github (https://github.com/jdeseze/cellfinder) and usable for anyone using Metamorph or Micromanager imaging software. 

      Minor 

      - Can the authors describe what they mean with "cell state"? It is used multiple times in the manuscript and can be interpreted as various things. 

      We now explain what we mean by ‘cell state’ in the main text :

      “protein basal activities and/or concentrations - which we called the cell state”

      - “(from 0% to 45%, Figure 2D)", maybe add here: "compare also with Fig. 2A". 

      We completed the sentence as suggested, which clarifies the data for the readers.

      - The sentence "Given that the phenotype switch appeared to be controlled by the amount of overexpressed optoPRG, we hypothesized that the corresponding leakiness of activity could influence the cell state prior to any activation." might be hard to understand for readers unfamiliar with optogenetic systems. I suggest adding a short sentence explaining dark-state activity/leakiness before putting the hypothesis forward. 

      We changed this whole beginning of the paragraph to clarify.

      - Figure 2E and SF2A. I would suggest swapping these two panels as the quantification of the membrane displacement before activation seems more relevant in this context. 

      We thank reviewer #1 for this suggestion and we agree with it (we swapped the two panels)

      - Fig. 2B is missing the white frames in the mixed panels. 

      We are sorry for this mistake, we changed it in the new version.  

      - In the text describing the experiment of Fig. 4G, it would again be helpful to define what the authors mean by cell state, or to state the expected outcome for both hypotheses before revealing the result.

      We added precisions above on what we meant by cell state, which is the basal protein activities and/or concentrations prior to optogenetic activation. We added the expectation as follow: 

      To discriminate between these two hypotheses, we overexpressed the DH-PH domain alone in another fluorescent channel (iRFP) and recruited the mutated PH at the membrane. “If the binding to RhoA-GTP was only required to change the cell state, we would expect the same statistics than in Figure 2D, with a majority of protruding cells due to DH-PH overexpression. On the contrary, we observed a large majority of retracting phenotype even in highly expressing cells (Figure 4G), showing that the PH binding to RhoA-GTP during recruitment is a key component of the protruding phenotype.”

      - Figure 4H,I: "of cells that overexpress PRG, where we only recruit the PH domain" doesn't match with the figure caption. Are these two constructs in the same cell? If not please clarify the main text. 

      We agree that it was not clear. Both constructs are in the same cell, and we changed the figure caption accordingly.  

      - "since RhoA dominates Cdc42" is this concluded from experiments (if yes, please refer to the figure) or is this known from the literature (if yes, please cite). 

      The assumption that RhoA dominates Cdc42 comes from the fact that we see retraction at low PRG concentration. We assumed that RhoA is responsible for the retraction phenotype. Our assumption is based on the literature (Burridge 2004 as an example of a review, confirmed by many experiments, such as the direct recruitment of RhoA to the membrane, see Berlew 2021) and is supported by our observations of immediate increase of RhoA activity at low PRG. We modified the text to clarify it is an assumption.

      - Fig. 6G  o left: is not intuitive, why are the number of molecules different to start with? 

      The number of molecules is different because they represent the active molecules: increasing the amount of PRG increases the amount of active RhoA and active Cdc42. We updated the figure to clarify this point.

      o right: the y-axis label says "phenotype", maybe change it to "activity" or add a second y-axis on the right with "phenotype"? 

      We updated the figure following reviewer #1 suggestion.

      - Discussion: "or a retraction in the same region" sounds like in the same cell. Perhaps rephrase to state retraction in a similar region? 

      Sorry for the confusion, we change it to be really clear: “a protrusion in the activation region when highly expressed, or a retraction in the activation region when expressed at low concentrations.”

      Typos: 

      - "between 3 and 10 fold" without s. 

      - Fig. 1H, y-axis label. 

      - "whose spectrum overlaps" with s. 

      - "it first decays, and then rises" with s. 

      - Fig 4B and Fig 6B. Is the time in sec or min? (Maybe double-check all figures). 

      - "This result suggests that one could switch the phenotype in a single cell by selecting it for an intermediate expression level of the optoPRG.". 

      - "GEF-H1 PH domain has almost the same inhibition ability as PRG PH domain". 

      We corrected all these mistakes and thank the reviewer for his careful reading of the manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      Likewise, the model assumes that at high PRG GEF expression, the "reaction is happening far from saturation ..." and that "GTPases activated with strong stimuli -giving rise to strong phenotypic changes- lead to only 5% of the proteins in a GTP-state, both for RhoA and Cdc42". Given the high levels of expression (the absolute value of which is not known) this assumption is not necessarily safe to assume. The shift to Cdc42 could indeed result from the quantitative conversion of RhoA into its active state. 

      We agree with the reviewer that the hypothesis that RhoA is fully converted into its active state cannot be completely ruled out. However, we think that the two following points can justify our choice.

      - First, we see that even in the protruding phenotype, RhoA activity is increasing upon optoPRG recruitment (Figure 3). This means that RhoA is not completely turned into its active GTP-loaded state. The biosensor intensity is rising by a factor 1.5 after 5 minutes (and continue to increase, even if not shown here). For sure, it could be explained by the relocation of RhoA to the place of activation, but it still shows that cells with high PRG expression are not completely saturated in RhoA-GTP. 

      - We agree that linearity (no saturation) is still an hypothesis and very difficult to rule out, because it is not only a question of absolute concentrations of GEFs and RhoA, but also a question of their reaction kinetics, which are unknow parameters in vivo. Yet, adding a saturation parameter would mean adding 3 unknown parameters (absolute concentrations of RhoA, as well as two reaction constants). The fact that there are not needed to fit the complex curves of RhoA as we do with only one parameter tends to show that the minimal ingredients representing the interaction are captured here.  

      The observed "inhibition of RhoA by the PH domain of the GEF at high concentrations" could result from the ability of the probe to, upon membrane recruitment, bind to active RhoA (via its PH domain) thereby outcompeting the RhoA biosensor (Figure 4A-C). This reaction is explicitly stated in the supplemental materials ("PH domain binding to RhoA-GTP is required for protruding phenotype but not sufficient, and it is acting as an inhibitor of RhoA activity."), but should be more explicit in the main text. Indeed, even when PRG DHPH is expressed at high concentrations, it does activate RhoA upon recruitment (figure 3GH). Not only might overexpression of this active RhoA-binding probe inhibit the cortical recruitment of the RhoA biosensor, but it may also inhibit the ability of active RhoA to activate its downstream effectors, such as ROCK, which could explain the decrease in myosin accumulation (figure 3D-F). It is not clear that there is a way to clearly rule this out, but it may impact the interpretation. 

      This hypothesis is actually what we claim in the manuscript. We think that the inhibition of RhoA by the PH domain is explained by its direct binding. We may have missed what Reviewer #2 wanted to say, but we think that we state it explicitly in the main text :

      “Knowing that the PH domain of PRG triggers a positive feedback loop thanks to its binding to active RhoA 18, we hypothesized that this binding could sequester active RhoA at high optoPRG levels, thus being responsible for its inhibition.”

      And also in the Discussion:

      “However, this feedback loop can turn into a negative one for high levels of GEF: the direct interaction between the PH domain and RhoA-GTP prevents RhoA-GTP binding to effectors through a competition for the same binding site.”

      We may have not been clear, but we think that this is what is happening: the PH domain prevents the binding to effectors and decreases RhoA activity (as was shown in Chen et al. 2010).  

      The X-axis in Figure 4C time is in seconds not minutes. The Y-axis in Figure 4H is unlabeled. 

      We are sorry for the mistake of Figure 4C. We changed the Y-axis in the Figure 4h.  

      Although this publication cites some of the relevant prior literature, it fails to cite some particularly relevant works. For example, the authors state, "The LARG DH domain was already used with the iLid system" and refers to a 2018 paper (ref 19), whereas that domain was first used in 2016 (PMID 27298323). Indeed, the authors used the plasmid from this 2016 paper to build their construct. 

      We thank the reviewer for pointing out this error, we have corrected the citation and put the seminal one in the revised version.

      An analogous situation pertains to previous work that showed that an optogenetic probe containing the DH and PH domains in RhoGEF2 is somewhat toxic in vivo (table 6; PMID 33200987). Furthermore, it has previously been shown that mutation of the equivalent of F1044A and I1046E eliminates this toxicity (table 6; PMID 33200987) in vivo. This is particularly important because the Rho probe expressing RhoGEF2-DHPH is in widespread usage (76 citations in PubMed). The ability of this probe to activate Cdc42 may explain some of the phenotypic differences described resulting from the recruitment of RhoGEF2-DHPH and LARG-DH in a developmental context (PMID 29915285, 33200987). 

      We thank reviewer #2 for these comments, and added a small section in the discussion, for optogenetic users: 

      This underlines the attention that needs to be paid to the choice of specific GEF domains when using optogenetic tools. Tools using DH-PH domains of PRG have been widely used, both in mammalian cells and in Drosophila (with the orthologous gene RhoGEF2), and have been shown to be toxic in some contexts in vivo 28. Our study confirms the complex behavior of this domain which cannot be reduced to a simple RhoA activator.   

      Concerning the experiment shown in 4D, it would be informative to repeat this experiment in which a non-recruitable DH-PH domain of PRG is overexpressed at high levels and the DH domain of LARG is recruited. This would enable the authors to distinguish whether the protrusion response is entirely dependent on the cell state prior to activation or the combination of the cell state prior to activation and the ability of PRG DHPH to also activate Cdc42. 

      We thank the reviewer for his suggestion. Yet, we think that we have enough direct evidence that the protruding phenotype is due to both the cell state prior to activation and the ability of PRG DHPH to also activate Cdc42. First, we see a direct increase in Cdc42 activity following optoPRG recruitment (see Figure 6). This increase is sustained in the protruding phenotype and precedes Rac1 and RhoA activity, which shows that it is the first of these three GTPases to be activated. Moreover, we showed that inhibition of PAK by the very specific drug IPA3 is completely abolishing only the protruding phenotype, which shows that PAK, a direct effector of Cdc42 and Rac1, is required for the protruding phenotype to happen. We know also that the cell state prior to activation is defining the phenotype, thanks to the data presented in Figure 2. 

      We further showed in Figure 1 that LARG DH-PH domain was not able to promote protrusion. The proposed experiment would be interesting to confirm that LARG does not have the ability to activate another GTPase, even in a different cell state with overexpressed PRG. However, we are not sure it would bring any substantial findings to understand the mechanism we describe here, given the facts provided above.  

      Similarly, as PRG activates both Cdc42 and Rho at high levels, it would be important to determine the extent to which the acute Rho activation contributes to the observed phenotype (e.g. with Rho kinase inhibitor). 

      We agree with the reviewer that it would be interesting to know whether RhoA activation contributes to the observed phenotype, and we have tried such experiments. 

      For Rho kinase inhibitor, we tried with Y-27632 and we could never prevent the protruding phenotype to happen. However, we could not completely abolish the retracting phenotype either (even when the effect on the cells was quite strong and visible), which could be due to other effectors compensating for this inhibition. As RhoA has many other effectors, it does not tell us that RhoA is not required for protrusion. 

      We also tried with C3, which is a direct inhibitor of RhoA. However, it had too much impact on the basal state of the cells, making it impossible to recruit (cells were becoming round and clearly dying. As both the basal state and optogenetic activation require the activation of RhoA, it is hard to conclude out of experiments where no cell is responding. 

      The ability of PRG to activate Cdc42 in vivo is striking given the strong preference for RhoA over Cdc42 in vitro (2400X) (PMID 23255595). Is it possible that at these high expression levels, much of the RhoA in the cell is already activated, so that the sole effect that recruited PRG can induce is activation of Cdc42? This is related to the previous point pertaining to absolute expression levels.  

      As discussed before, we think that it is not only a question of absolute expression levels, but also of the affinities between the different partners. But Reviewer #2 is right, there is a competition between the activation of RhoA and Cdc42 by optoPRG, and activation of Cdc42 probably happens at higher concentration because of smaller effective affinity.

      Still, we know that activation of the Cdc42 by PRG DH-PH domain is possible in vivo, as it was very clearly shown in Castillo-Kauil et al., 2020 (PMID 33023908). They show that this activation requires the linker between DH and PH domain of PRG, as well as Gαs activation, which requires a change in PRG DH-PH conformation. This conformational switch does not happen in vitro, which might explain why the affinity against Cdc42 was found to be very low. 

      Minor points 

      In both the abstract and the introduction the authors state, "we show that a single protein can trigger either protrusion or retraction when recruited to the plasma membrane, polarizing the cell in two opposite directions." However, the cells do not polarize in opposite directions, ie the cells that retract do not protrude in the direction opposite the retraction (or at least that is not shown). Rather a single protein can trigger either protrusion or retraction when recruited to the plasma membrane, depending upon expression levels. 

      We thank the reviewer for this remark, and we agree that we had not shown any data supporting a change in polarization. We solved this issue, by showing now in Supplementary Figure 1 the change in areas in both the activated and in the not activated region. The data clearly show that when a protrusion is happening, the cell retracts in the non-activated region. On the other hand, when the cell retracts, a protrusion happens in the other part of the cell, while the total area is staying approximately constant. 

      We added the following sentence to describe our new figure:

      Quantification of the changes in membrane area in both the activated and non-activated part of the cell (Supp Figure 1B-C) reveals that the whole cell is moving, polarizing in one direction or the other upon optogenetic activation.

      While the authors provide extensive quantitative data in this manuscript and quantify the relative differences in expression levels that result in the different phenotypes, it would be helpful to quantify the absolute levels of expression of these GEFs relative to e.g. an endogenously expressed GEF. 

      We agree with the reviewer comment, and we also wanted to have an idea of the absolute level of expression of GEFs present in these cells to be able to relate fluorescent intensities with absolute concentrations. We tried different methods, especially with the purified fluorescent protein, but having exact numbers is a hard task.

      We ended up quantifying the amount of fluorescent protein within a stable cell line thanks to ELISA and comparing it with the mean fluorescence seen under the microscope. 

      We estimated that the switch concentration was around 200nM, which is 8 times more than the mean endogenous concentration according to https://opencell.czbiohub.org/, but should be reachable locally in wild type cell, or globally in mutated cancer cells. 

      Given the numerical data (mostly) in hand, it would be interesting to determine whether RhoGEF2 levels, cell area, the pattern of actin assembly, or some other property is most predictive of the response to PRG DHPH recruitment. 

      We think that the manuscript made it clear that the concentration of PRG DHPH is almost 100% predictive of the response to PRG DHPH. We believe that other phenotypes such as the cell area or the pattern of actin assembly would only be consequences of this. Interestingly, as experimentators we were absolutely not able to predict the behavior by only seeing the shape of the cell, event after hundreds of activation experiments, and we tried to find characteristics that would distinguish both populations with the data in our hands and could not find any.

      There is some room for general improvement/editing of the text. 

      We tried our best to improve the text, following reviewers suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful to the reviewers and the editor for their detailed feedback, insightful suggestions, and thoughtful assessment of our work. The revised manuscript has taken into account all the comments of the three reviewers. We have also undertaken additional analyses and added materials in response to reviewer suggestions. In brief:

      (1) We have conducted a more in-depth analysis of frequency domain HRV metrics to better depict the change of autonomic tone.

      (2) We have revised the manuscript to provide justifications for the chosen taVNS protocol and to clearly articulate the objectives of the current study.

      (3) In response to comments from reviewer #2, we have included two new tables that present the absolute changes in cardiovascular metrics, clinical characteristics for the two trial arms, and effects of taVNS adjusted for age.

      Other significant amendments include:

      (1) An expanded discussion linking our findings to the existing literature on the effects of taVNS on cardiovascular function, biomarkers for taVNS response, the safety of taVNS, and the dose-response relationship of taVNS.

      (2) Revision to the Method section to provide details of QT interval calculation.

      Reviewer #1 (Public Review):

      The authors report the results of a randomized clinical trial of taVNS as a neuromodulation technique in SAH patients. They found that taVNS appears to be safe without inducing bradycardia or QT prolongation. taVNS also increased parasympathetic activity, as assessed by heart rate variability measures. Acute elevation in heart rate might be a biomarker to identify SAH patients who are likely to respond favorably to taVNS treatment. The latter is very important in light of the need for acute biomarkers of response to neuromodulation treatments.

      Comments:

      (1) Frequency domain heart rate variability measures should be analyzed and reported. Given the short duration of the ECG recording, the frequency domain may more accurately reflect autonomic tone.

      We sincerely appreciate this encouraging summary of our paper. We have analyzed and reported frequency-domain heart rate variability measures, including the relative power of the high-frequency band (0.15–0.4 Hz) and the relative power of the low-frequency band (0.04 – 0.15). We showed the distribution of the two frequency-domain HRV measures in supplementary Figure 2C-D. For 24-hour ECG recording, we found that the change in the relative high-frequency power from Day 1 was not significantly different between the treatment groups. As both high-frequency band and low-frequency band power are relative to the total power, the comparison of the relative power of the low-frequency band between groups would be the opposite of the relative power of the high-frequency band. As both time-domain and frequency-domain HRV measures can reflect the autonomic tone, we performed factor analysis to identify the parasympathetic activity component (Figure 2D). Comparing the change in parasympathetic activity component and relative high-frequency power, we observed similarities and discrepancies. Specifically, both the change in parasympathetic activity component and the change in relative high-frequency power were higher in the taVNS group at the early phase (Day 2 - 4).

      We also observed higher high-frequency power in the Sham group at the later phase. If the factor analysis successfully isolates the parasympathetic activity, there should be other factors than the parasympathetic activity affecting the relative power of the high-frequency band. One such factor is the respiration rate. The high-frequency range is between 0.15 to 0.4 Hz, corresponding to respiration's frequency range of approximately 9 to 24 breaths per minute. If the respiration rate increases and exceeds 24 breaths per minute, the respiratory-driven HRV might occur at a frequency higher than the typical high-frequency band. Given that the respiration rate was higher in the taVNS treatment group, a compensatory mechanism to ensure oxygen delivery (Figure 4E), we hypothesized that observed lower high-frequency power in the taVNS treatment group compared to sham at later phases is a result of increased respiration rate in the taVNS treatment group. Indeed, we found the normalized high-frequency power is higher when RR is less than 25 bpm compared to when RR > 25 bpm (Cohen’s d = 0.85, Supplementary Figure 3A). Moreover, an increase in RR in the taVNS treatment group is associated with a decrease in high-frequency power (Supplementary Figure 3B). These control analyses underscored the necessity of performing factor analysis to robustly measure parasympathetic activities and confirm that taVNS treatment mitigated the sympathetic overactivation during the early phase.

      We have now discussed the results of frequency-domain HRV measures in the Discussion section: taVNS and autonomic system (p23): “A key metric that reflects this restored sympathovagal balance is the increase in heart rate variability (Figure 3F). Specifically, the factor analysis showed that the parasympathetic activity was significantly higher in the taVNS treatment group. This difference was most pronounced during the early phase, particularly between Days 2 and 4 following SAH. In addition to analyzing the correlation between the parasympathetic activity factor and established HRV measures that reflect parasympathetic activity such as RMSSD and pNNI_50 (Figure 3C), we also examined changes in a frequency-domain HRV measure—the relative power of the high-frequency band (0.15–0.4 Hz)—to validate the accuracy of the factor analysis. the relative power of the high-frequency band is widely used to indicate respiratory sinus arrhythmia, a process primarily driven by the parasympathetic nervous system (Supplementary Figure 2). We found that both the change in parasympathetic activity factor and relative high-frequency power were higher in the taVNS group at the early phase (Day 2 - 4). Conversely, we observed higher high-frequency power in the Sham group during the later phase. If the factor analysis successfully isolates the parasympathetic activity, there should be other factors than the parasympathetic activity affecting the relative power of the high-frequency band. One such factor is the respiration rate. The high-frequency range is between 0.15 to 0.4 Hz, corresponding to respiration's frequency range of approximately 9 to 24 breaths per minute. If the respiration rate increases and exceeds 24 breaths per minute, the respiratory-driven HRV might occur at a frequency higher than the typical high-frequency band. Given that the respiration rate was higher in the taVNS treatment group, a compensatory mechanism to ensure oxygen delivery (Figure 4E), we hypothesized that observed lower high-frequency power in the taVNS treatment group compared to sham at later phases is a result of increased respiration rate in the taVNS treatment group. Indeed, we found the normalized high-frequency power is higher when RR is less than 25 bpm compared to when RR > 25 bpm (Cohen’s d = 0.85, Supplementary Figure 3A). Moreover, an increase in RR in the taVNS treatment group is associated with a decrease in high-frequency power (Supplementary Figure 3B). These control analyses underscored the necessity of performing factor analysis to robustly measure parasympathetic activities and confirm that taVNS treatment mitigated the sympathetic overactivation during the early phase.”

      We have also reported the changes in the relative power of the high-frequency band between the two treatment groups in Supplementary Figure 6. We did not find a significant change in relative high-frequency band power between the treatment groups (Treatment – pre-treatment difference: p = 0.74, Cohen’s d = -0.08, N(Sham) = 199, N(taVNS) = 188, Mann-Whitney U test). We reported these results in the Results section: Acute effects of taVNS on cardiovascular function (p18): “There were no significant differences in changes in corrected QT interval or heart rate variability, as measured by RMSSD, SDNN, and relative power of high-frequency band between treatment groups (Figure 5D and E and Supplementary Figure 6).”

      How was the "dose" chosen (20 minutes twice daily)?

      The choice of a 20-minute taVNS session twice daily was informed by findings from Addorisio et al. (2019), where the authors administered 5-minute taVNS twice daily to patients with rheumatoid arthritis for two days. They found that the circulating c-reactive protein (CRP) levels significantly reduced after two days of treatment but returned to baseline at the second clinical assessment by day 7. Given the high inflammatory state associated with subarachnoid hemorrhage (SAH) and our intention to maintain a steady reduction in inflammation, we extended the duration of taVNS to 20 minutes per session. We have clarified this stimulation schedule's rationale in the Results section (p5-6): “This treatment schedule was informed by findings from Addorisio et al., where a 5-minute taVNS protocol was administered twice daily to patients with rheumatoid arthritis for two days.29 Their study found that circulating c-reactive protein (CRP) levels significantly reduced after 2 days of treatment but returned to baseline at the second clinical assessment by day 7. Given the high inflammatory state associated with SAH and our intention to maintain a steady reduction in inflammation, we decided to extend the treatment duration to 20 minutes per session.”

      Addorisio, Meghan E., et al. "Investigational treatment of rheumatoid arthritis with a vibrotactile device applied to the external ear." Bioelectronic Medicine 5 (2019): 1-11.

      The use of an acute biomarker of response is very important. A bimodal response to taVNS has been previously shown in patients with atrial fibrillation (Kulkarni et al. JAHA 2021).

      Thank you for this valuable insight and for bringing the study by Kulkarni et al. to our attention. Their study showed that the response to Low-Level Tragus Stimulation (LLTS) varied among patients with atrial fibrillation, which can be predicted by acute P-wave alternans (PWA) to some degree. We have discussed the implication of the bimodal response to taVNS in the Discussion section (p26-27): “Kulkarni et al. showed that the response to low-level tragus stimulation (LLTS) varied among patients with atrial fibrillation.49 Similarly, in our study, not all patients in the taVNS treatment group showed a reduction in mRS scores (improved degree of disability or dependence). This differential response may be inherent to taVNS and potentially influenced by factors such as anatomical variations in the distribution of the vagus nerve at the outer ear. These findings underscore the importance of using acute biomarkers to guide patient selection and optimize stimulation parameters. Furthermore, we found that increased heart rate was a potential acute biomarker for identifying SAH patients who are most likely to respond favorably to taVNS treatment. Translating this finding into clinical practice will require further research to elucidate the mechanisms by which an acute increase in heart rate may predict the outcomes of patients receiving taVNS, including its relationship with neurological evaluations, vasospasm, echocardiography, and inflammatory markers.”

      Reviewer #2 (Public Review):

      Summary:

      This study investigated the effects of transcutaneous auricular vagus nerve stimulation (taVNS) on cardiovascular dynamics in subarachnoid hemorrhage (SAH) patients. The researchers conducted a randomized clinical trial with 24 SAH patients, comparing taVNS treatment to a Sham treatment group (20 minutes per day twice a day during the ICU stay). They monitored electrocardiogram (ECG) readings and vital signs to assess acute as well as middle-term changes in heart rate, heart rate variability, QT interval, and blood pressure between the two groups. The results showed that repetitive taVNS did not significantly alter heart rate, corrected QT interval, blood pressure, or intracranial pressure. However, it increased overall heart rate variability and parasympathetic activity after 5-10 days of treatment compared to the sham treatment. Acute taVNS led to an increase in heart rate, blood pressure, and peripheral perfusion index without affecting corrected QT interval, intracranial pressure, or heart rate variability. The acute post-treatment elevation in heart rate was more pronounced in patients who showed clinical improvement. In conclusion, the study found that taVNS treatment did not cause adverse cardiovascular effects, suggesting it is a safe immunomodulatory treatment for SAH patients. The mild acute increase in heart rate post-treatment could potentially serve as a biomarker for identifying SAH patients who may benefit more from taVNS therapy.

      Strengths:

      The paper is overall well written, and the topic is of great interest. The methods are solid and the presented data are convincing.

      Weaknesses:

      (1) It should be clearly pointed out that the current paper is part of the NAVSaH trial (NCT04557618) and presents one of the secondary outcomes of that study while the declared first outcomes (change in the inflammatory cytokine TNF-α in plasma and cerebrospinal fluid between day 1 and day 13, rate of radiographic vasospasm, and rate of requirement for long-term CSF diversion via a ventricular shunt) are available as a pre-print and currently under review (doi: 10.1101/2024.04.29.24306598.). The authors should better stress this point as well as the potential association of the primary with the secondary outcomes.

      Thank you for this valuable suggestion. The current study indeed focuses on the trial’s secondary outcomes. The main objective is to evaluate the cardiovascular safety of the taVNS protocol and to provide insights that will inform the application of taVNS in SAH patients. Following your comments, we have clarified this in the Introduction section (p6): “The current study is part of the NAVSaH trial (NCT04557618) and focuses on the trial’s secondary outcomes, including heart rate, QT interval, HRV, and blood pressure.32 This interim analysis aims to evaluate the cardiovascular safety of the taVNS protocol and to provide insights that will inform the application of taVNS in SAH patients. The primary outcomes of this trial, including change in the inflammatory cytokine TNF-α and rate of radiographic vasospasm, are available as a pre-print and currently under review.26”

      The negative association between HRV and inflammatory cytokines has been reported in numerous studies such as (Williams et al., Brain, Behavior, and Immunity, 2019; Haensel et al., Psychoneuroendocrinology. 2008). There are some studies suggesting that increased sympathetic tone following SAH is associated with vasospasm (Bjerkne Wenneberg, S. et al., Acta Anaesthesiologica Scandinavica. 2020; Megjhani et al., Neurocrit Care. 2020). Based on the literature, we compared the effects of taVNS on primary and secondary outcomes. The findings from the two parallel analyses are consistent: taVNS treatment reduced pro-inflammatory cytokines and increased HRV. Furthermore, the analyses of the primary outcomes revealed a reduction in the presence of any radiographic vasospasm in the taVNS treatment group compared to the sham. We have now integrated these findings and discussed them in the Discussion section (p25-26): “Given the negative association between pro-inflammatory markers and HRV, our finding that HRV was higher in the taVNS treatment group aligns with the findings of primary outcomes of this clinical trial, which showed that taVNS treatment reduced pro-inflammatory cytokines, including tumor necrosis factor-alpha (TNF-α) and interleukin-6.26,52 The consistency between these findings strengthens the evidence supporting the anti-inflammatory effects of taVNS. In addition, the sympathetic predominance following SAH is implicated in an increased risk of delayed cerebral vasospasm, which is most commonly detected 5-7 days after SAH.12 Given that taVNS treatment mitigated the sympathetic overactivation before the typical onset of cerebral vasospasm, it could potentially reduce the severity of this complication.”

      (2) The references should be implemented particularly concerning other relevant papers (including reviews and meta-analysis) of taVNS safety, particularly from a cardiovascular standpoint, such as doi: 10.1038/s41598-022-25864-1 and doi: 10.3389/fnins.2023.1227858).

      Thank you for providing the relevant papers. We have provided these references in the Introduction section to provide a more comprehensive background of our study (p6): “While some animal studies have reported a potential risk of bradycardia and decreased blood pressure associated with vagus nerve stimulation, two reviews of human studies have considered the cardiovascular effects of taVNS generally safe, with adverse effects reported only in patients with pre-existing heart diseases. 21,22,23

      (3) The dose-response issue that affects both VNS and taVNS applications in different settings should be mentioned (doi: 10.1093/eurheartjsupp/suac036.) as well as the need for more dose-finding preclinical as well as clinical studies in different settings (the best stimulation protocol is likely to be disease-specific).

      Overall, the present work has the important potential to further promote the usage of taVNS even on critically ill patients and might set the basis for future randomized studies in this setting

      Thank you for this valuable insight. Scientific understanding of the dose-response relationship and determining optimal parameters tailored to specific disease contexts has been recognized as an important part of taVNS research and, more generally, in the electrical neuromodulation field. Studies in this direction are often complex and time-intensive due to the multitude of possible parameter combinations. As such, most taVNS studies opted to use parameters that have been established in previous studies. For example, 20 Hz taVNS is extensively used as a therapeutic intervention in stroke (Matyas Jelinek ,2024, https://www.sciencedirect.com/science/article/pii/S0014488623003138). As we pioneer the application of taVNS as an immunomodulation technique in SAH patients, we also adopt parameters reported in similar studies, aiming to provide a basis for future preclinical and clinical studies of taVNS in this patient population. As you noted, the effects of taVNS are dose-dependent, necessitating systematic exploration of the parameter space, including frequency, intensity, and duration. Our findings of the acute biomarker (heart rate) hold the promise of close-loop taVNS. We have now emphasized the importance of investigating how parameters/dose affect taVNS’s effects on immune function and cardiovascular function in SAH patients (p28): “As we pioneer the application of taVNS as an immunomodulation technique in SAH patients, we adopt parameters (20 Hz, 0.4 mA) reported in similar studies.55 The current study provides a basis for future preclinical and clinical studies of taVNS in this patient population. To build on our findings, a systematic evaluation of the relationship between parameters such as frequency, intensity, and duration and taVNS’s effects on the immune system and cardiovascular function is necessary to establish taVNS as an effective therapeutic option for SAH patients.56”

      Reviewer #2 (Recommendations For The Authors):

      The paper is overall well written, and the topic is of great interest. The reviewer has some major comments:

      (1) It should be clearly pointed out that the current paper is part of the NAVSaH trial and presents one of the secondary outcomes of that study while the declared first outcomes (change in the inflammatory cytokine TNF-α in plasma and cerebrospinal fluid between day 1 and day 13, rate of radiographic vasospasm, and rate of the requirement for long-term CSF diversion via a ventricular shunt) are available as a pre-print and currently under review (doi: 10.1101/2024.04.29.24306598.).

      We have revised the manuscript following your comment. Please see comment Reviewer 2 Public Review and our response.

      The authors should assess the relationship between the impact of taVNS on inflammatory markers in plasma and in cerebrospinal fluid and the autonomic responses. The association between inflammatory markers and noninvasive autonomic markers as well as sympathovagal balance should also be assessed. Specifically, the authors should try to assess whether the acute post-treatment elevation in heart rate was more pronounced in patients who experienced a more pronounced reduction in inflammatory biomarkers. Indeed, since all patients in the current study received the same dose of taVNS (20 Hz frequency, 250 μs pulse width, and 0.4 mA intensity), while in several cardiovascular studies (doi: 10.1016/j.jacep.2019.11.008, doi: 10.1007/s10286-023-00997-z) the intensity (amplitude) of taVNS was differentially set based on the subjective pain/sensory threshold, that might be a marker of acute afferent neuronal engagement.

      We agree that analyzing the change in cardiovascular metrics and changes in inflammatory markers is an important next step. In particular, testing whether the acute elevation in heart rate correlates with changes in inflammatory markers could further establish heart rate as a biomarker to guide patient selection and optimize stimulation parameters. (Please refer to comment 1.3 and our responses). However, in this paper, the primary objective is the cardiovascular safety of the current taVNS protocol in SAH patients. This association between inflammatory markers and autonomic responses extends beyond the scope of the current manuscript and would be more appropriately addressed in a separate publication.

      Previous literature has shown a negative association between HRV and inflammatory markers in SAH patients (for example, Adam, J., 2023). It is reasonable to postulate that taVNS modulates the immune system and the autonomic system synergistically. We found that parasympathetic tone was higher in the taVNS treatment group, with the most notable differences observed between Days 2 and 4 following SAH (Figure 3F). In a separate study of the primary outcomes of this trial (Huguenard et al., 2024), serum levels of IL-6 (pro-inflammation cytokine) were also significantly lower in the taVNS treatment group on Day 4 (Figure 3A, in our preprint, https://doi.org/10.1101/2024.04.29.24306598).

      We appreciate your input regarding the potential mechanism behind acute heart rate changes. In this trial, all patients who were able to engage in verbal communication were asked if they felt any prickling or pain during all sessions. We confirmed that the current stimulation setting was sub-perception in all trialed patients, making it unlikely that the observed heart rate increase was due to pain or sensory perception. Our current hypothesis is that successful activation of the afferent vagal pathway by taVNS increased arousal, resulting in increased heart rate. We have revised the Discussion section based on your insight (p29): “All patients who were capable of verbal communication were asked if they felt any prickling or pain during all sessions. We confirmed that the current taVNS protocol is below the perception threshold for all trialed patients. Altogether, successful activation of the afferent vagal pathway by taVNS increased arousal, resulting in increased heart rate.50,51”

      Huguenard, A. L. et al. Auricular Vagus Nerve Stimulation Mitigates Inflammation and Vasospasm in Subarachnoid Hemorrhage: A Randomized Trial. (2024) doi:10.1101/2024.04.29.24306598.

      Adam, J., Rupprecht, S., Künstler, E. C. S. & Hoyer, D. Heart rate variability as a marker and predictor of inflammation, nosocomial infection, and sepsis – A systematic review. Autonomic Neuroscience vol. 249 103116 (2023).

      A new table should be provided with the mean (or median) values of the two arms of the population (taVNS and sham) including baseline clinical characteristics, comorbidities (mean age, % of female, % with known hypertension, diabetes, etc), ongoing medications (% on beta-betablockers, etc), and pre, during and post-treatment absolute values (expressed as mean or median depending on the distribution) of the studied parameters (QT and QTc absolute values, heart rate, SDNN, etc) in order for the reader to have a better understanding of how SAH affects these parameters. Absolute changes in the abovementioned parameters should also be presented in the table. For instance, the reported absolute increase in heart rate, based on Figure 5, panel C, seems very modest, below 2 bpm. This is very important to underlying for several reasons, including the fact that the evaluation of the impact of treatment on heart rate variability as assessed in the time domain might be influenced by concomitant changes in heart rate due to the nonlinearity of neural modulation of sinus node cycle length. Indeed, time-domain indexes of HRV intrinsically increase when heart rate decreases in a nonlinear way, while frequency domain indexes (e.g. the low frequency/high frequency (LF/HF) ratio), appear to be devoid of intrinsic rate-dependency (doi: 10.1016/s0008-6363(01)00240-1).

      Thank you for your suggestion. We have added the new table to the manuscript. In this table, we include clinical characteristics, the median of absolute values of cardiovascular metrics from 24-hour ECG recording, and the median absolute changes in these metrics for both arms. We believe that absolute values of cardiovascular metrics from 24-hour ECG recording are more informative about how SAH affects these parameters than metrics for the pre-, during-, and post-treatment periods.

      In Result (p7), we have added: “Supplementary Table 3 shows the clinical characteristics of the two treatment groups.” In Result, Acute effect of taVNS on cardiovascular function (p20), we have added: “Supplementary Table 3 summarizes the absolute changes in cardiovascular metrics for the treatment groups.”

      Thank you for raising the concern about HRV and providing the reference. We have now reported frequency domain indexes in our results: relative power of high-frequency power, which is negatively correlated with the LF/HF ratio. The high-frequency power is used to capture sinus arrhythmia, reflecting the parasympathetic modulation of the heart. Although the frequency domain metrics might be less susceptible to the rate-dependency (doi: 10.1016/s0008-6363(01)00240-1), there are circumstances when the frequency domain metrics might not accurately reflect the autonomic tone (Please see Reviewer 1 Publice Review and our responses).

      An attempt to correct the effect of taVNS on the evaluated autonomic parameters according to age should be provided, considering that there were no age limits and parasympathetic indexes, particularly at the sinus node level, are known to decrease with age, particularly for those older than 65 years.

      Thank you for the suggestion. We were aware of the influence of age on cardiac heart rate and heart rate variability. In our initial analysis, we compared the change in autonomic parameters from day 1 within each subject across the two treatment groups. This approach controls for individual differences, including those due to age. In addition to your comment, age is a risk factor for subarachnoid hemorrhage. Older individuals often face an increased risk of poor outcomes. To further verify if age influences autonomic changes following SAH, we performed ANCOVA on autonomic function parameters with age included as a covariate. This analysis showed that age was negatively correlated with changes in heart rate, SDNN, and RMSSD from Day 1, but not with changes in QT intervals. After adjusting for age, we found that RMSSD changes and SDNN changes were significantly higher in the taVNS treatment group, while QTc changes were significantly lower in this group. These results align with the main findings (Figures 2 and 3). In addition, autonomic changes following SAH may be influenced by age. Specifically, lower RMSSD and SDNN in older individuals suggest a greater shift toward sympathetic predominance following SAH. We have now reported these results in Supplementary Table 4 and discussed their implication in the Discussion section (p28): “To control for individual differences, including those due to age, our study compared the change in cardiovascular parameters from Day 1 within each subject across treatment groups. To further verify if age influences autonomic changes following SAH, we performed ANCOVA on autonomic function parameters with age included as a covariate. This analysis showed that age was negatively correlated with changes in heart rate, SDNN, and RMSSD from Day 1 but not with changes in QT intervals. After adjusting for age, we found that RMSSD changes and SDNN changes were significantly higher, while QTc changes were significantly lower in the taVNS treatment group (Supplementary Table 4). These results align with the conclusion that repetitive taVNS treatment increased HRV and was unlikely to cause bradycardia or QT prolongation. In addition, autonomic changes following SAH may be influenced by age. Specifically, lower RMSSD and SDNN in older individuals suggest a greater shift toward sympathetic predominance following SAH (Supplementary Table 4).”

      The results of the current study should be discussed considering what was previously demonstrated concerning the cardiovascular effects of taVNS (doi: 10.3389/fnins.2023.1227858).

      We appreciate the suggestion to consider previous findings on the cardiovascular effects of taVNS. However, it is important to note that most studies investigating the cardiovascular effects of taVNS involve healthy individuals, whereas our study focuses on SAH patients who are critically ill. Given the influence of SAH on cardiovascular parameters, we should be cautious when generalizing our findings to the broader population. Previous studies involving stroke populations have reported cardiovascular parameters descriptively as part of their safety assessments (doi: 10.1155/2020/8841752). Our study is currently the only one systematically investigating the cardiovascular safety of taVNS in SAH patients. Furthermore, the review paper (doi: 10.3389/fnins.2023.1227858) includes a highly heterogeneous mix of studies, such as auricular acupressure, auricular acupuncture, and electrical stimulation applied to different parts of the ear. For the subset of studies involving electrical stimulation, there is considerable variation in the parameters used, with frequencies ranging from 0.5 Hz to 100 Hz, currents from 0.1 mA to 45 mA, and durations spanning from 20 minutes to 168 days. These variations make direct comparisons with our findings challenging.

      It looks like QT measurements were performed automatically. It should be specified which method was used for the measurements (threshold, tangent, or superimposed method?).

      In our study, QT intervals were measured based on thresholding after wavelet transforming the ECG signals (Martínez, J. P., IEEE Transactions on Biomedical Engineering, 2004, doi: 10.1109/TBME.2003.821031). The local maxima of the wavelet transform correspond to significant changes in the ECG signal, such as the rapid upward or downward deflections associated with the QRS complex. The algorithm searches modulus maxima, that is, peaks of wavelet transform coefficients that exceed specific thresholds, to identify the QRS complex. R peaks are found as the zeros crossing between the positive-negative modulus maxima pair. After localizing the R peak, the Q onset is detected as the beginning of the first modulus maximum before the modulus maximum pair created by the R wave. To identify the T wave, the algorithm searches for local maxima in the absolute wavelet transform in a search window defined relative to the QRS complex. Thresholding is used to identify the offset of the T wave. Please refer to comments 3.4 and 3.5 and our responses for details. We have clarified the method for measuring QT in the Method section (p35): “This algorithm identifies the QRS complex by searching for modulus maxima, which are peaks in the wavelet transform coefficients that exceed specific thresholds. The onset of the QRS complex is determined as the beginning of the first modulus maximum before the modulus maximum pair created by the R wave. To identify the T wave, the algorithm searches for local maxima in the absolute wavelet transform in a search window defined relative to the QRS complex. Thresholding is used to identify the offset of the T wave.”

      QTc dispersion was not evaluated, and this should be listed as a limitation of the current study.

      We have added this limitation in the Discussion section: Limitations and outlook (p31): “The current study did not explore the effects of taVNS on less commonly used cardiovascular metrics, such as QTc dispersion.”

      It has been recently suggested (doi: 10.1016/j.brs.2018.12.510) that QTc, as a potential indirect marker of HRV, might be used as a biomarker for VNS response in the treatment of resistant depression. The author should try to assess whether in the current study baseline QTc before taVNS is associated with outcome and with taVNS response.

      Thank you for the suggestion. The conference abstract in the provided doi stated that QTc as an indirect marker of HRV before implantation was correlated with changes in the depression rating scale. The mechanism seems to be that QTc has information about the pathophysiology of the depression (10.1097/YCT.0000000000000684). The current study focused on the comparison between taVNS treatment and sham treatment. Our future study will further test if SAH patients’ response to taVNS can be predicted by baseline QTc.

      The dose-response issue that affects both VNS and taVNS in different settings should be mentioned (doi: 10.1093/eurheartjsupp/suac036.) as well as the need for more dose-finding preclinical as well as clinical studies in different settings (the best stimulation protocol is likely to be disease-specific).

      Please refer to our responses to comment 3.

      Minor Comments

      Some typos or commas instead of affirmative points and vice versa.

      Thank you for pointing this out. We have carefully proofread the manuscript and made the necessary corrections to ensure proper punctuation and grammar throughout.

      Table 1: why age is expressed as a range for each person?

      MedRxiv asks authors to remove all identifying information. Precise ages are direct identifiers, as opposed to age ranges. We have now revised the age column to ‘decade of life’ in the updated table. We believe this modification reduces confusion while adhering to MedRxiv’s guidelines.

      Although already reported in the study protocol (doi: 10.1101/2024.03.18.24304239), the heart rate limits for inclusion should be reported (sustained bradycardia on arrival with a heart rate < 50 beats per minute for > 5 minutes, implanted pacemaker or another electrical device).

      We have now added the specific inclusion and exclusion criteria in the Method details section (p33): “Inclusion criteria were: (1) Patients with SAH confirmed by CT scan; (2) Age > 18; (3) Patients or their legally authorized representative are able to give consent. Exclusion criteria were: (1) Age < 18; (2) Use of immunosuppressive medications; (3) Receiving ongoing cancer therapy; (4) Implanted electrical device; (5) Sustained bradycardia on admission with a heart rate < 50 beats per minute for > 5 minutes; (6) Considered moribund/at risk of imminent death.”

      Why did the authors choose a taVNS schedule of two times per day of 30 minutes each as compared for instance to one hour per day? Please comment on that also referring to other taVNS studies in the acute setting such as the one by Dasari T et al (doi: 10.1007/s10286-023-00997-z.) where taVNS was applied for 4 hours twice daily. For instance, Yum Kim et al (doi: 10.1038/s41598-022-25864-1) recently reported in a systematic review and meta-analysis of taVNS, safety, that repeated sessions and sessions lasting 60 min or more were shown to be more likely to lead to adverse events.

      The International Consensus-Based Review and Recommendations for Minimum Reporting Standards in Research on Transcutaneous Vagus Nerve Stimulation should be referred to and contextualized (doi: 10.3389/fnhum.2020.568051).

      Thank you for raising this question and providing relevant references. We have reviewed the proposed checklist for minimum reporting items in taVNS research (10.3389/fnhum.2020.568051) and have ensured that our manuscript complies with the recommended reporting items.

      The current taVNS schedule was based on findings from Addorisio et al. (2019). We have revised the manuscript to clarify the rationale behind the current taVNS protocol. Please refer to our response to comment 1.2. The two studies mentioned in the comments were published after our trial was designed and initiated (https://clinicaltrials.gov/study/NCT04557618). Based on the meta-analysis by Yum Kim et al., the short duration of treatment sessions might explain the cardiovascular safety of the current taVNS protocol. We are also currently assessing the effects of our taVNS protocol on inflammatory markers.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to characterize the cardiovascular effects of acute and repetitive taVNS as an index of safety. The authors concluded that taVNS treatment did not induce adverse cardiovascular effects, such as bradycardia or QT prolongation.

      Strengths:

      This study has the potential to contribute important information about the clinical utility of taVNS as a safe immunomodulatory treatment approach for SAH patients.

      Weaknesses:

      A number of limitations were identified:

      (1) A primary hypothesis should be clearly stated. Even though the authors state the design is a randomized clinical trial, several aspects of the study appear to be exploratory. The method of randomization was not stated. I am assuming it is a forced randomization given the small sample size and approximately equal numbers in each arm.

      Thank you for the suggestion. The current study is part of the NAVSaH trial (NCT04557618), aiming to define the effects of taVNS on inflammatory markers, vasospasm, hydrocephalus, and continuous physiology data. This study focuses on the effects of repetitive and acute taVNS on continuous physiology data to evaluate the cardiovascular safety of the current taVNS protocol. The primary hypothesis tested in our study is that repetitive taVNS increased HRV but did not cause bradycardia and QT prolongation. Following your comments, we have clarified this in the Introduction section (p6): “This interim analysis aims to evaluate the cardiovascular safety of the taVNS protocol and to provide insights that will inform the application of taVNS in SAH patients. The primary outcomes of this trial, including change in the inflammatory cytokine TNF-α and rate of radiographic vasospasm, are available as a pre-print and currently under review.26 Based on a meta-analysis, repeated sessions lasting 60 min or more are likely to lead to aversive effects; therefore, we hypothesized that repetitive taVNS increased HRV but did not cause bradycardia and QT prolongation.23”

      (2) The authors "first investigated whether taVNS treatment induced bradycardia or QT prolongation, both potential adverse effects of vagus nerve stimulation. This analysis showed no significant differences in heart rate calculated from 24-hour ECG recording between groups." A justification should be provided for why a difference is expected from 20 minutes of taVNS over a period of 24 hours. Acute ECG changes are a concern for increasing arrhythmic risk, for example, due to cardiac electrical restitution properties.

      A human study (Clancy, L. A. et al., Brain Stimulation, 2017, https://doi.org/10.1016/j.brs.2014.07.031) has found that 15-min taVNS led to reduced sympathetic activity measured by low-frequency/high-frequency (LF/HF) ratio. The sympathetic activity remained lower than baseline levels during the recovery period, suggesting potential long-term effects of taVNS on cardiovascular function. In addition, the repetitive taVNS treatment in this clinical trial was intended to maintain a steady low-inflammatory state. Given the potential life-threatening implications of bradycardia and QT prolongation in these critically ill patients, we deemed it crucial to evaluate heart rate and QT interval both acutely and from 24-hour ECG monitoring. We have now provided the justification in the Result section (p11): “A study has shown that 15 minutes of taVNS reduced sympathetic activity in healthy individuals, with effects that persist during the recovery period.33 This finding suggests that taVNS may exert long-term effects on cardiovascular function. Therefore, we investigated whether repetitive taVNS treatment affects heart rate and QT interval, key indicators of bradycardia or QT prolongation, using 24-hour ECG recording.”

      An additional value of analyzing 24-hour ECG recording is that we can detect bradycardia or QT prolongation that happen outside the period of the stimulation, which could caused by repetitive taVNS. To this end, we reanalyzed the data and calculated the percentage of prolonged QT intervals using 500ms criterion (Giudicessi, J. R., Noseworthy, P. A. & Ackerman, M. J. The QT Interval. Circulation, 2019). When comparing the percentage of prolonged QT intervals between the treatment groups, we found that changes in prolonged QT intervals percentage from Day 1 were higher in the Sham group (Figure 3F, Mann–Whitney U test, N(taVNS) = 94, N(Sham)=95, p-value < 0.001, Cohen’s d = -0.72). We have now reported the results in the Result section (p11): “To ensure that repetitive taVNS did not lead to QT prolongation happening outside the period of stimulation, we calculated the percentage of prolonged QT intervals. Prolonged QT intervals were defined as corrected QT interval >= 500 ms. We found that changes in prolonged QT intervals percentage from Day 1 were higher in the Sham group (Figure 3F, Mann–Whitney U test, N(taVNS) = 94, N(Sham)=95, p-value < 0.001, Cohen’s d = -0.72).

      The concern regarding acute ECG changes related to increased arrhythmic risk is valid. We have improved the reasoning behind analyzing acute ECG change, which now reads (p20): “Assessing the acute effect of taVNS on cardiovascular is crucial for its safe translation into clinical practice. We compared the acute change of heart rate, corrected QT interval, and heart rate variability between treatment groups, as abrupt changes in the pacing cycle may increase the risk of arrhythmias.”

      (3) More rigorous evaluation is necessary to support the conclusion that taVNS did not change heart rate, HRV, QTc, etc. For example, shifts in peak frequencies of the high-frequency vs. low-frequency power may be effective at distinguishing the effects of taVNS. Further, compensatory sympathetic responses due to taVNS should be explored by quantifying the changes in the trajectory of these metrics during and following taVNS.

      We appreciate your concerns regarding the potential effects on the autonomic system associated with taVNS treatment. We would like to clarify that the primary objective of our study was to evaluate the cardiovascular safety of the taVNS protocol in SAH, with a specific focus on detecting any acute changes in heart rate and QT interval. As you highlighted, such acute ECG changes are a concern for increasing arrhythmic risk. By directly studying the trend of heart rate, HRV, and QT over the acute treatment periods, we found no significant change in these metrics between treatment groups. In addition, these metrics remained within 0.5 standard deviations of their daily fluctuations during and following taVNS treatment (Figure 5 and Supplementary Figure 6). These findings support the conclusion that the current protocol is unlikely to cause cardiac complications.

      In response to your suggestion to conduct a more rigorous analysis, particularly concerning peak frequencies within the high-frequency (HF) and low-frequency (LF) bands, we pursued this analysis to explore more nuanced effects of taVNS on the autonomic system. We compared the shifts in peak frequencies within these bands between the treatment groups and found no significant changes that would suggest a sympathetic or parasympathetic shift following acute taVNS.

      In detail, we have made the following revisions following your comments:

      (1) We have clarified the motivation behind studying the acute change of cardiac metrics following taVNS treatment – monitoring the cardiovascular safety of current taVNS protocol in SAH patients (p18): please refer to response to comment 3.2.

      (2) We compared the peak frequencies of the high-frequency and low-frequency bands following taVNS. added the results in the supplementary materials:

      We note that neurophysiology underlying peak frequencies has not been thoroughly studied in the literature compared to the LF-band power or HF-band power. Therefore, we report this result as an exploratory analysis.

      (3) We have added the changes of QTc during and following taVNS in Figure 5 and showed that they were within 0.5 standard deviations of their daily fluctuations during and following taVNS treatment. We have now shown the changes of HRV during and following taVNS in Supplementary Figure 6 A-D. We added the change of high-frequency power following Reviewer #1’s comment 1.1. Overall, our results suggest that repetitive taVNS increased parasympathetic activities, while there is no evidence that acute taVNS significantly affected heart rate or QT.

      (4) The authors do not state how the QT was corrected and at what range of heart rates. Because all forms of corrections are approximations, the actual QT data should be reported along with the corrected QT.

      The corrected QT interval (QTc) estimates the QT interval at a standard heart rate of 60 bpm. In practice, we removed RR intervals outside of the 300 – 2000 ms range. Further, we removed ectopic beats, defined as RR intervals differing by more than 20% from the one proceeding. We used the Bazett formula to correct the QT intervals: . We have now clarified how QT was corrected in the Method section – Data processing (p35-36): “R-peaks were detected as local maxima in the QRS complexes. P-waves, T-waves, and QRS waves were delineated based on the wavelet transform (Figure 2A-C).34  RR intervals were preprocessed to exclude outliers, defined as RR intervals greater than 2 s or less than 300 ms. RR intervals with > 20% relative difference to the previous interval were considered ectopic beats and excluded from analyses. After preprocessing, RR intervals were used to calculate heart rate, heart rate variability, and corrected QT (QTc) based on Bazett's formula: .44 The corrected QT interval (QTc) estimates the QT interval at a standard heart rate of 60 bpm.”

      We have reported the actual QT data in the Result section (p10 and p 19):” Moreover, changes in corrected QT interval from Day 1 were significantly higher in the Sham group compared to the taVNS group (Figure 3B, Mann–Whitney U test, N(taVNS) = 94, N(Sham)=95, p-value < 0.001, Cohen’s d = -0.57). Similarly, uncorrected QT intervals from Day 1 were higher in the Sham group (Supplementary Figure 10A, Cohen’s d = -0.42).”

      “Supplementary Figure 10B-C shows the acute changes in uncorrected QT interval.”

      (5) The QT extraction method needs to be more robust. For example, in Figure 2C, the baseline voltage of the ECG is shifting while the threshold appears to be fixed. If indeed the threshold is not dynamic and does not account for baseline fluctuations (e.g., due to impedance changes from respiration), then the measures of the QT intervals were likely inaccurate.

      A robust method to estimate the QT interval is essential in our study. To this end, we used the state-of-the-art method to calculate QT intervals. We first applied a 0.5 Hz fifth-order high-pass Butterworth filter and a 60 Hz powerline filter on the ECG recording. The high-pass filtering is used to correct potential baseline fluctuations. Subsequently, a wavelet-based algorithm was used to delineate the QRS complex and T wave (Martínez, J. P., IEEE Transactions on Biomedical Engineering, 2004). In short, this algorithm identifies QRS based on modulus maxima of the wavelet transform of ECG signals. After localizing the R peak, the Q onset is detected as the beginning of the first modulus maximum before the modulus maximum pair created by the R wave. The detection is performed on wavelet transform at a small scale rather than on the original signal, minimizing the effect of baseline shift (see III Detection methods, (5), Cuiwei Li et al., IEEE TBME, 1995, Detection of ECG Characteristic Points Using Wavelet Transforms). T wave is detected similarly based on wavelet transform. Please refer to our response to comment 2.9.

      Martínez, J. P., Almeida, R., Olmos, S., Rocha, A. P., & Laguna, P. (2004). A wavelet-based ECG delineator: evaluation on standard databases. IEEE Transactions on Biomedical Engineering, 51(4), 570-581.

      In Figure 2C, the purple and green lines take the value of 1 at the QRS onset or the T wave offset; otherwise, 0, which might appear to be a threshold. We have now used verticle lines to denote the detected QRS onsets and T wave offsets. Please see below for a comparison of the annotation:

      We have clarified the details of extracting QT intervals from ECG recordings in the Method section (p31): “To calculate cardiac metrics, we first applied a 0.5 Hz fifth-order high-pass Butterworth filter and a 60 Hz powerline filter on ECG data to reduce artifacts. 35 We detected QRS complexes based on the steepness of the absolute gradient of the ECG signal using the Neurokit2 software package.35 R-peaks were detected as local maxima in the QRS complexes. P waves, T waves, and QRS complexes were delineated based on the wavelet transform of the ECG signals proposed by Martinez J. P. et al. (Figure 2A-C).36 This algorithm identifies the QRS complex by searching for modulus maxima, which are peaks in the wavelet transform coefficients that exceed specific thresholds. The onset of the QRS complex is determined as the beginning of the first modulus maximum before the modulus maximum pair created by the R wave. To identify the T wave, the algorithm searches for local maxima in the absolute wavelet transform in a search window defined relative to the QRS complex. Thresholding is used to identify the offset of the T wave.”

      We have modified Figure 2C for better clarity:

      More statistical rigor is needed. For example, in Figure 2D, the change in heart rate for days 5-7, 8-10, and 11-13 is clearly a bimodal distribution and as such, should not be analyzed as a single distribution. Similarly, Figure 2E also shows a bimodal distribution. Without the QT data, it is unclear whether this is due to the application of the heart rate correction method.

      Thank you for raising this concern. Several factors could contribute to the observed distribution of changes in heart rate for days 5-7, 8-10, and 11-13, as shown in Figure 2D. One such factor is the smaller sample size in the later days. The mean duration of hospitalization for the 24 subjects included in this study was 11.29 days, with a standard deviation of 6.43, respectively. Other factors include variations in medical history, SAH pathology, and clinical outcomes during hospitalization. Further analysis revealed that heart rate was lower in patients with improved mRS scores (Supplementary Figure 4B), suggesting that clinical outcomes might impact changes in heart rate. Understanding the association between cardiovascular metrics and clinical assessments, such as vasospasm and inflammation, could help decide whether future taVNS trials should control for these factors when evaluating the effects of taVNS on cardiovascular function. We are currently continuing to recruit SAH patients in this clinical trial, and we plan to perform such analyses in future studies.

      In the manuscript, we reported the effect size between the treatment groups for days 5-7, 8-10, and 11-13. This should be interpreted in conjunction with the characteristics of the distribution. To provide a rigorous interpretation of our results, we have now discussed these considerations in the discussion section (p28): “We noticed a high variance of change in heart rate for days 5 – 7, 8 – 10, and 11 – 13 for both treatment groups (Figure 2D). This may be due to the small sample size in the later days, given that the mean duration of hospitalization for the 24 subjects included in this study was 11.3 days with a standard deviation of 6.4. Differences in medical history and clinical outcomes during hospitalization may also explain the variance of change in heart rate for the later days. For example. heart rate was lower in patients with improved mRS scores (Supplementary Figure 4B). Understanding the association between cardiovascular metrics and clinical assessments, such as vasospasm and inflammation, could help decide whether future taVNS trials should control for these factors when evaluating the effects of taVNS on cardiovascular function.”

      To test our hypothesis that repetitive taVNS does not induce significant heart rate change, we performed a two-tailed equivalence test of heart rate change between the two treatment groups, including data from days 2-13 (Figure 2D, left panel). To verify the validity of this approach, we calculated the Bimodality Coefficient (BC) and performed the Dip Test for unimodality for the distribution of heart rate change for the two treatment groups. The Bimodality Coefficient (BC) is a measure that combines skewness and kurtosis to assess whether a distribution is bimodal or unimodal. A BC value greater than 0.555 typically indicates a bimodal distribution, whereas a BC value less than or equal to 0.555 suggests an unimodal distribution. The Dip Test is a statistical test that assesses the unimodality of a distribution. A non-significant p-value (p-value ≥ 0.05) indicates that the distribution is likely unimodal. This analysis suggests that the distributions of heart rate changes in both treatment groups (days 2 - 13) are unimodal (BC = 0.457 and p = 0.374 for the taVNS treatment group; BC = 0.421 and p = 0.656 for the sham treatment group). This finding provides justification for our statistical approaches.

      Figure 3A shows a number of outliers. A SDNN range of 200 msec should raise concern for a non-sinus rhythm such as arrhythmia or artifact, instead of sinus arrhythmia. Moreover, Figure 3B shows that the Sham RMSSD data distribution is substantially skewed by the presence of at least 3 outliers, resulting in lower RMSSD values compared to taVNS. What types of artifact or arrhythmia discrimination did the authors employ to ensure the reported analysis is on sinus rhythm? The overall results seem to be driven by outliers.

      Mild cardiac abnormalities are common in SAH patients. Therefore, change in cardiovascular metrics was expected to differ from healthy individuals, which makes studying the cardiovascular effect on taVNS extremely important in this context. Following your comment, we investigated whether the large SDNN change was due to arrhythmia or artifacts. Except for a single instance where one subject exhibited an SDNN change of 200 ms on a particular day, all other SDNN changes were less than 150 msec. We identified the subject and day associated with the largest SDNN change, which is Day 7. As shown in Author response image 1A and B, SDNN of this subject increased on day 7 while the heart rate (HR) of this subject decreased. Changes in HRV were inversely related to HR changes, suggesting shifts in sympathetic and parasympathetic tone. We checked the ECG recording and the extracted NN intervals (processed RR intervals) on that day. The NN intervals are more variate on day 7 compared to day 1 (Author response image 1C and D). To determine whether the significant variance observed between 5:01 am and 5:02 am was due to arrhythmia or artifacts, we closely examined the corresponding ECG signals (Author response image 1E and F). Based on our analysis, the elevated SDNN is unlikely to be attributed to artifacts.

      Author response image 1.

      Similarly, we identified the subjects and days corresponding to the most prominent RMSSD decrease in the sham treatment group. We verified the ECG quality for this subject and the accuracy of RR interval identification, and that there was no significant cardiovascular event during the subject’s stay in the ICU. Based on the inclusion and exclusion criteria defined in our protocol (Huguenard A et al.m PLOS ONE, 2024), we did not exclude these data from our analysis.

      Huguenard A, Tan G, Johnson G, Adamek M, Coxon A, et al. (2024) Non-invasive Auricular Vagus nerve stimulation for Subarachnoid Hemorrhage (NAVSaH): Protocol for a prospective, triple-blinded, randomized controlled trial. PLOS ONE 19(8): e0301154. https://doi.org/10.1371/journal.pone.0301154

      To ensure accurate inferences about sympathetic and parasympathetic tone from these cardiovascular metrics, we have rigorously refined our methodologies, including correcting RR intervals outliers, correcting ectopic peaks, using state-of-art algorithms to identify QRS complex, P wave, and T wave (please refer to response to comment 3.5), and performing factor analysis. In addition, no significant cardiac complications have been reported by the attending physicians for the subjects included in this study. Nonetheless, it is important to note that ECG patterns in patients with SAH differ from those in healthy individuals, potentially impacting the accuracy of R peak identification. For example, one identified R peak (out of 73) was Q peak (F in the above figure). The pathology associated with SAH complicates the precise calculation of cardiovascular metrics and the interpretation of the results. We are committed to continually improving our methodologies for assessing autonomic function in SAH patients. We have now discussed these limitations in the Discussion section (p31-32): “Mild cardiac abnormalities are common in SAH patients5, complicating the precise calculation of cardiovascular metrics from ECG signals and the interpretation of the results. Systematic verification of methods for calculating cardiovascular metrics to ensure their applicability in SAH patients is crucial.”

      The above concern will also affect the power analysis, which was reported by authors to have been performed based on the t-test assuming the medium effect size, but the details of sample size calculations were not reported, e.g., X% power, t-test assumed Bonferroni correction in the power analysis, etc.

      Thank you for raising this concern. The current study is part of the NAVSaH trial (NCT04557618), focusing on the trial’s secondary outcomes (Please refer to comment 2.1 and our responses). The main objective of this interim analysis is to evaluate the cardiovascular safety of the current taVNS protocol. Goal enrollment for the pilot NAVSaH trial is 50 patients, based on power calculations to detect significant differences in inflammatory cytokines, radiographic vasospasm, and chronic hydrocephalus. The detailed power analysis is described in the protocol (Huguenard A et al.m PLOS ONE, 2024):

      “Under a 2-by-2 repeated measures design consisting of two groups of patients, each measured at two time points, our goal is to compare the change across time in the taVNS group to the change across time in the Sham group. Based upon previous work from Koopman et al. [67], we assume our study will observe 1.1 standardized inflammatory cytokines mean change difference between the two groups. Using a two-sided, two-sample t-test, assuming both time points have equal variance and there is a weak correlation (i.e., 0.15) between measurement pairs, a sample size of 25 in each group achieves at least 80% power to detect a standardized difference of 1.1 in mean changes, with a significance level (alpha) of 0.05 [68].

      Based on our preliminary data, we assume this study will observe 25% and 55% severe vasospasm in the taVNS and Sham groups, respectively. Under a design with 2 repeated measurements (i.e., 2 raters), assuming a compound symmetry covariance structure with a Rho of 0.2, at a significance level (alpha) of 0.05, a sample size of 25 in each group achieves at least 80% power when the null proportion is 0.55, and the alternative proportion is 0.25 [69–71].

      As previously described, LV et al. [8] studied the relationship between cytokine levels and clinical endpoints in SAH, including hydrocephalus. From their outcomes, we predict a needed enrollment of approximately 50 to detect these endpoints. From our own preliminary data, with an incidence of chronic hydrocephalus 0% in treated patients and 28.6% in control (despite grade of hemorrhage), alpha = 0.05 and power = 0.80, the projected sample size to capture that change is approximately 44 patients.”

      In this study, we used power analysis to report the achieved power of insignificant findings. For example, a Mann-Whitney U test on heart rate change between the treatment groups revealed no significant differences. We then used power analysis to calculate the achieved power. We have added the details of power analysis in the Method section (p34): “We calculated the achieved power of tests on heart rate change between the treatment groups assuming a medium effect size (Cohen’s d of 0.5) and a Type I error probability (a) of 0.05. Given that the Mann-Whitney U test is a non-parametric counterpart to the t-test and that the asymptotic relative efficiency of the U test relative to the t-test is 0.95 with normal distributions, we estimated the achieved power based on the power of a two-sample t-test, which is 0.93. We have clarified this in the introduction section and in the method section (p6 and p38):

      “The current study is part of the NAVSaH trial (NCT04557618) and focuses on the trial’s secondary outcomes, including heart rate, QT interval, HRV, and blood pressure.30 This interim analysis aims to evaluate the cardiovascular safety of the taVNS protocol and to provide insights that will inform the application of taVNS in SAH patients. The primary outcomes of this trial, including change in the inflammatory cytokine TNF-α and rate of radiographic vasospasm, are available as a pre-print and currently under review.24”

      “In this study, we reported the statistical power achieved for tests that yielded non-significant results. The achieved power is calculated based on a two-sample t-test assuming a medium effect size (Cohen’s d of 0.5) and a Type I error probability (a) of 0.05.”

      If the study was designed to show a cardiovascular effect, I am surprised that N=10 per group was considered to be sufficiently powered given the extensive reports in the literature on how HRV measures (except when pathologically low) vary within individuals. Moreover, HRV measures are especially susceptible to noise, artifacts, and outliers.

      If the study was designed to show a lack of cardiovascular effect (as the conclusions and introduction seem to suggest), then a several-fold larger sample size is warranted.

      The primary goal of this study is to assess the cardiovascular safety of the current taVNS protocol in SAH patients (please refer to comments 2.1 and 3.8 and our responses). More specifically, we want to assess whether the current taVNS protocol is associated with bradycardia or QT prolongation. The data in this study included ECG signals and vital signals from 24 subjects recruited between 2021 and 2024. The total number of days in the ICU is 271 days, which corresponds to 542 taVNS/sham treatment sessions. These data allow us to detect significant cardiovascular effects of acute taVNS with high power. For example, the comparison of heart rate from pre- to post-treatment sessions between treatment groups had power > 99% (N1 = 188, N2 = 199, assuming 0.05 type I error probability, medium effect size two sample t-test).

      To safely conclude that there is no significant cardiovascular effect of repetitive taVNS on any given day following SAH, we would need to perform statistical tests between treatment groups on Day 1, Day 2, and Day N. In this context, 64 subjects per treatment group are required to achieve 80% power assuming medium effect size and 0.05 type I error probability (two-sample t-test). We have acknowledged this limitation in the Discussion section. Thank you for raising this concern!

      The results reported in this study treat each day as an independent sample for several reasons. First, heart rate and HRV metrics exhibited great daily variations (Figure in comment 3.7, for example). Their value on one day was not predictive of the metrics on another day, which could be due to medications, interventions, or individualized SAH recovery process during the patient’s stay in the ICU. Second, SAH patients in the ICU often experience rapid/daily changes in clinical status, including fluctuations in intracranial pressure, blood pressure, neurological status, and other vital signs. Also, the recovery process from SAH is highly individualized, with different patients exhibiting distinct trajectories of recovery or complications. Day-to-day cardiovascular function changes varied as the patient recovered or encountered setbacks. Moreover, we verified ECG signal quality, corrected outliers and artifacts in ECG processing, and employed a state-of-the-art QRS delineation method (Please refer to comment 3.5). All these ensure the accuracy of our reported results.

      The revised Discussion section now reads (31): ” Our study considers each day as an independent sample for the following considerations: 1. heart rate and HRV metrics exhibited great daily variations. Their value on one day was not predictive of the metrics on another day, which could be due to medications, interventions, or individualized SAH recovery process during the patient’s stay in the ICU. 2. SAH patients in the ICU often experience daily changes in clinical status, including fluctuations in intracranial pressure, blood pressure, neurological status, and other vital signs. 3. Day-to-day cardiovascular function changes varied as the patient recovered or encountered setbacks. To conclusively establish that there is no significant cardiovascular effect of repetitive taVNS on any given day following SAH, we would need to perform statistical tests between treatment groups for each day. In this context, 64 subjects per treatment group are required to achieve 80% power assuming medium effect size and 0.05 type I error probability (two-sample t-test).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Reviewer #1 was very appreciative of our results and commented “This is a novel result in ferredoxin and a significant contribution to the field”. We are very honored and pleased.

      Reviewer #2:

      (1) Changing the nomenclature of the models investigated to include the oxidation state being discussed. As they are now (CM, CMNA, etc), multiple re-reads were required to ascertain which redox state was being discussed for a particular model in a given section of the text. Appending "Ox" or "Red" for oxidized or reduced would be sufficient. 

      As you indicated there are several nomenclatures to distinguish the model systems in the text. On the other hand, the main issue discussed in the text is the ionization potential (IP), which is calculated by the difference in energies between oxidized and reduced states for each model. In other words, a discussion of the IP value on each model includes both the “Ox” and “Red” energies. In order to clarify the relationship between the nomenclature of models and redox states, we added sentences below.

      “Note that the IP value is obtained for each model by calculating both the Ox and Red state energies of the model.” (lines 195-196).

      On the other hand, we must specify the charge state when the geometry optimization is performed for CM and CMH models. Therefore, we revised the sentence as follows.

      “The decrease in |IP| value indicates that the relative stability of the Red state is suppressed compared with the CMH but is significantly larger than the CM, suggesting the importance of the protonation of Asp64 (Fig. S2B). 

      To consider the effect of the structural change caused by the redox on the IP, geometrical optimization of the 4Fe-4S core was performed for the CM (Red) and CMH (Red) models using the same level of theory to the single-point calculations. The optimized Cartesian coordinates are summarized in Table S3. As illustrated in Fig. S2A, the IP values of CM and CMH change from –3.27 to –2.38 eV (|DIP| = 0.89 eV), and from –1.06 to –0.19 eV (|DIP| = 0.87 eV), respectively, before and after the geometrical optimization.” (lines 224-232)

      (2) In addition to the very thorough DFT investigation of the different spin and charge combinations, did the authors try a broken-symmetry calculation to obtain the ground state description of the FeS cluster? Given the ubiquity of this approach in other FeS cluster studies, it was surprising that this approach was not taken here. Granted, the DFT investigation of each possible combination is sufficiently thorough and need not be redone. 

      Thank you for your comments. A term “spin-unrestricted method”, which is used in the manuscript in the text is synonym of “broken-symmetry method”. In order to emphasize this, we revised the manuscript as follows. 

      “All calculations were performed by using the spin-unrestricted (broken-symmetry) hybrid DFT method with the B3LYP functional set. As the basis set, 6-31G* and 6-31+G* were used for [Fe, C, N, O, H] and [S] atoms, respectively, for the IP calculations.” (Line 451)

      (3) Line 161 "an" to "a" 

      We corrected the mistake. Thank you so much. (Line 161)

      (4) Figure 4A seems a bit odd. Why do the traces eclipse the y-axis? And the traces between 330 and 370 nm are much noisier and appear thicker than the rest of the plot. Is this an issue with the monochromator grating used in wavelength selection? Reducing the thickness of the individual traces may help the data presentation in this figure. Also, the arrows on the plot have an opaque white background. Can this be removed so that the arrows do not eclipse the traces in the plot? 

      The spectrum in the Fig.4A seemed to be odd. The spectral figure has been revised to improve its appearance. (We have also corrected E53A in Figure 5B.) This reviewer also pointed out that “the traces between 330 and 370 nm are much noisier”. We are struggling with the noise caused by the grating (or the motor malfunction) of the monochromator as you pointed out. Once the monochromator is repaired and a smooth spectrum is obtained, we will upload further revisions.

      (5) Figure S9 is a very nice schematic illustrating the general findings of the study. Can this be moved to the main text?

      Thank you for your helpful comment. Accordingly, the Fig.9S and its legend are moved to the main text. (Lines 675-680)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript by Bai et al concerns the expression of Scleraxis (Scx) by muscle satellite cells (SCs) and the role of that gene in regenerative myogenesis. The authors report the expression of this gene associated with tendon development in satellite cells. Genetic deletion of Scx in SCs impairs muscle regeneration, and the authors provide evidence that SCs deficient in Scx are impaired in terms of population growth and cellular differentiation. Overall, this report provides evidence of the role of this gene, unexpectedly, in SC function and adult regenerative myogenesis.

      We appreciate the comments and thank her/him for the support.

      There are a few minor points of concern.

      (1) From the data in Figure 1, it appears that all of the SCs, assessed both in vitro and in vivo, express Scx. The authors refer to a scRNA-seq dataset from their lab and one report from mdx mouse muscle that also reveals this unexpected gene expression pattern. Has this been observed in many other scRNA-seq datasets? If not, it would be important to discuss potential explanations as to why this has not been reported previously.

      Thanks for this question regarding data in Fig.1. We did initially use immunofluorescence staining of Pax7 and GFP on muscle sections and primary myoblast cultures prepared from Tg-ScxGFP mice to conclude that Scx was expressed in satellite cells (SCs). In addition to the cited mdx RNA-seq data, we have included a re-analysis of a published scRNA-seq data set in Fig.2E (Dell'Orso et al., Development, 2019), and our own scRNA-seq data (Fig.S5D, F). We have now re-examined an additional scRNA-seq data set of TA muscles at various regeneration time points (De Micheli et al., Cell Rep. 2020), in which Scx expression was detected in MuSC progenitors and mature muscle cells. We have added the De Micheli et al. reference and the re-analysis of that scRNA-seq data set for Scx expression as an additional panel in Fig. 2E, with accompanying text (p. 7, ln. 4-6). Thus, our immunostaining results are consistent with scRNA-seq data from our and two other independent scRNA-seq data sets.

      We think that Scx expression in the adult myogenic lineage was not previously reported mainly because its expression level was low, and might be dismissed as spurious detection. Additionally, detecting such low expression levels requires sophisticated detection methods with high capture efficiency. Previous studies have noted limitations in transcript capture or transcription factor dropout in 10x Genomics-based datasets (Lambert et al., Cell, 2018; Pokhilko et al., Genome Res., 2021). The most likely and straightforward reason is that Scx was simply not a focus in prior studies amid so many other genes of interest. We have now added this last explanation in the text (p.7, ln. 8-9), following the re-analyses of Scx expression in published scRNA-seq data sets.

      (2) A major point of the paper, as illustrated in Fig. 3, is that Scx-neg SCs fail to produce normal myofibers and renewed SCs following injury/regeneration. They mention in the text that there was no increased PCD by Caspase staining at 5 DPI. A failure of cell survival during the process of SC activation, proliferation, and cell fate determination (differentiation versus self-renewal) would explain most of the in vivo data. As such, this conclusion would seem to warrant a more detailed analysis in terms of at least one or two other time points and an independent method for detecting dead/dying cells (the in vitro data in Fig. 4F is also based on an assessment of activated Caspase to assess cell death). The in vitro data presented later in Fig. S4G, H do suggest an increase in cell loss during proliferative expansion of Scx-neg SCs. To what extent does cell loss (by whatever mechanism of cell death) explain both the in vivo findings of impaired regeneration and even the in vitro studies showing slower population expansion in the absence of Scx?

      We appreciate these constructive suggestions. Based on the number of available control and cKO animals, we were limited to one additional time point at 3 dpi to assess PCD by TUNEL in vivo. We were disappointed again to find no appreciable levels of PCD at 3 dpi by TUNEL (new Fig.S4I), thus no quantifications were included. We also re-did the in vitro experiment using purified SCs and monitored PCD by staining for cleaved Caspase-3 using a validated tube of antibodies (positive staining after 6 h of treatment by 1 mM staurosporine of control and ScxcKO cells; included as new Fig. S4J and legend). We were pleased to find an increase of cleaved Caspase3 stained cells, i.e. PCD, of Scx-cKO SCs at day 4 in culture, compared to that of the control. We have now replaced the old Fig. 4F with new Fig.4F and 4G to document PCD. We also provided new text/legend for these new data (p.10. ln. 2-10; new legend for Fig. 4F and 4G).

      (3) I'm not sure I understand the description of the data or the conclusions in the section titled "Basement membrane-myofiber interaction in control and Scx cKO mice". Is there something specific to the regeneration from Scx-neg myogenic progenitors, or would these findings be expected in any experimental condition in which myogenesis was significantly delayed, with much smaller fibers in the experimental group at 5 DPI?

      We very much appreciate this comment. We agree that there is unlikely anything specific about the regeneration from Scx-negative myogenic progenitors. Unfilled or empty ghost fibers (basement membrane remnant) are expected due to small fiber and poor regeneration in the ScxcKO mice at 5 dpi. We have removed the subtitle and changed the content to an expected consequence rather than something special (p. 8, ln. 19-22).

      (4) The data presented in Fig. 4B showing differences in the purity of SC populations isolated by FACS depending on the reporter used are interesting and important for the field. The authors offer the explanation of exosomal transfer of Tdt from SCs to non-SCs. The data are consistent with this explanation, but no data are presented to support this. Are there any other explanations that the authors have considered and that could be readily tested?

      Thanks for highlighting this phenomenon. We struggled with the SC purity issue for a long time. The project started with using the R26RtdT reporter for tdT’s paraformaldehyde  resistant strong fluorescence (fixation) to aid visualization in vivo. Later, when we used the tdT signal to purify SCs by FACS, we found that only 80% sorted tdT+ cells are Pax7+. We then switched to the R26RYFP reporter, from which we achieved much higher purity (95%) of SCs (Pax7+) by FACS. As such, we also repeated and confirmed many in vivo experimental results using the R26RYFP reporter (included in the manuscript). Due to the low purity of tdT+SCs by FACS, we discontinued that mouse colony after we confirmed the superior utility of the R26RYFP reporter for SC isolation.

      We sincerely apologize for not being able to conduct further testable experiments on this intriguing phenomenon. However, this issue has since been addressed and published by Murach et al., iScience, (2021). Like our experience, they found non-satellite mononuclear cells with tdT fluorescence after TMX treatment when SCs were isolated via FACS. To determine this was not due to off-target recombination or a technical artifact from tissue processing, they conducted extensive analyses. They found that the tdT+ mononuclear cells included fibrogenic cells (fibroblasts and FAPs), immune cells/macrophages, and endothelial cells. Additionally, they confirmed the significant potential of extracellular vesicle (EV)-mediated cargo transfer, which facilitates the transfer of full-length tdT transcript from lineage-marked Pax7+ cells to those mononuclear cells. We have modified the text to emphasize and acknowledge their contribution to this important point, and explained the difference between YFP and tdT reporter alleles in more detail (p.9, ln. 11-17).

      (5) The Cut&Run data of Fig. 6 certainly provide evidence of direct Scx targets, especially since the authors used a novel knock-in strain for analyses. The enrichment of E-box motifs provides support for the 207 intersecting genes (scRNA-seq and Cut&Run) being direct targets. However, the rationale elaborated in the final paragraph of the Results section proposing how 4 of these genes account for the phenotypes on the Scx-neg cells and tissues is just speculation, however reasonable. These are not data, and these considerations would be more appropriate in the Discussion in the absence of any validation studies.

      We agree with this comment and have moved speculations into the Discussion (p. 15, ln. 4-15, and from p. 18, ln. 4 to p. 19, ln. 4).

      Reviewer #2 (Public Review):

      Summary:

      Scx is a well-established marker for tenocytes, but the expression in myogenic-lineage cells was unexplored. In this study, the authors performed lineage-trace and scRNA-seq analyses and demonstrated that Scx is expressed in activated SCs. Further, the authors showed that Scx is essential for muscle regeneration using conditional KO mice and identified the target genes of Scx in myogenic cells, which differ from those of tendons.

      Strengths:

      Sometimes, lineage-trace experiments cause mis-expression and do not reflect the endogenous expression of the target gene. In this study, the authors carefully analyzed the unexpected expression of Scx in myogenic cells using some mouse lines and scRNA-seq data.

      We appreciate the comments and thank her/him for noting the strengths of our manuscript.

      Weaknesses:

      Scx protein expression has not been verified.

      We are aware of this weakness. We had previously used Western blotting (WB) using cultured SCs from control and ScxcKO mice, but did not detect endogenous Scx protein even in the control. In response to this comment, we have re-done several WB experiments using new lysates from control and ScxcKO SCs and two commercial antibodies: anti-Scx antibody 1 from Abcam (ab58655) and anti-Scx antibody 2 from Invitrogen (PA5-23943). These antibodies have been reported to detect endogenous Scx protein in tendon cells in Spang et al., BMC Musculoskelet Disord (2016) and  Bochon et al., Int J Stem Cells (2021). Despite our best efforts, we were not able to detect a reliable Scx band. We have also conducted immunofluorescence using these two antibodies. Still, we failed to detect a difference of staining signals between control and cKO SCs using these antibodies. Lastly, we conducted immunofluorescence using the ScxTy1 myoblasts and we did not find the staining signal coinciding with the Ty1 signal (by double staining). We have been very frustrated by not knowing what caused this technical difficulty in our hands. Given that these were negative data, we did not include them. However, we do hope that the combined data from scRNA-seq, ScxCreERT2 lineage-tracing, Tg-ScxGFP expression, and ScxTy1 knock-in together are deemed sufficient to make up for the deficiency of data for endogenous Scx protein in regenerative myogenic cells.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      p. 8: The text refers to Fig. 3I, but this should be Fig. 3H.

      We apologize for the confusion. Please note that by keeping all 14 dpi data in the same row, we placed Fig.3I at an unconventional/unexpected position, i.e., next to 3D &3E, and above 3F-H. We were aware that this unconventional placement could cause confusion, and it did. With that said, we have now re-arranged the subfigures (same data content) so that the updated Fig.3 contains subfigures in the expected and proper spatial order. We double-checked the figure referral in the text (p. 8, ln. 16-17) and the text is correct – just that the original Fig.3I should have been at the original Fig.3H position and that is now corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given that Scx binds to the E-box and regulates gene expression, it is of interest to know the relevance between MyoD and Scx. If possible, the reviewer recommends to include some discussions.

      Thanks for the comment. MyoD1 is a well-known transcript factor regulating myogenesis, whereas Scx is primarily studied in tenocytes and other connective tissues. We agree that our new findings deserve a discussion regarding the relevance between MyoD1 and Scx.  We have added a description of their differences in the discussion and two new references (p.19, ln. 7-17).

      (2) Considering that Scx is a transcriptional factor, it is interesting that Scx-GFP was not detected in the nuclei of regenerated myofibers. Could the subcellular localization of Scx-GFP provide some insights into the function of Scx as a transcription factor during muscle regeneration?

      Tg-ScxGFP is a transgenic line generated by random insertion into the genome (Pryce et al., 2007; cited). The plasmid used for transgenesis was constructed by replacing most of Scx’s first exon with GFP, and including ~ 9Kb flanking regulatory sequences. As such, the ScxGFP is not a fusion gene, but rather that the GFP expression is regulated by Scx promoter and enhancer(s). This GFP reporter lacks a nuclear localization signal (NLS), hence it is mainly detected in the cytoplasm; some nuclear signal is detected, presumably due to GFP’s small size permitting passive diffusion into the nucleus. Thus, the GFP signal is used as a reporter for Scx expression, but GFP subcellular localization does not provide insight into Scx function per se. Conversely, ScxTy1/Ty1 is a knock-in allele created by fusing a triple-Ty1 tag (3XTy1) to the C-terminus of Scx, and we observed that Ty1 is located in the nucleus by the immunofluorescent staining. We used the Ty1 epitope to carry out CUT&RUN experiments to gain insight to the function of Scx as a transcription factor.

      (3) Fig1D The number of arrows in the Merge image is not matched with others. In addition, the star mark in the Pax7 image is likely an error.

      Apologies. We have now corrected these errors in the revised Fig.1D.

      (4) FigS1A Is there only one myofiber shown in the dashed line in this image? It is unclear why only this myofiber is surrounded by the dashed line.

      The dashed line encircles a single fiber because it was not visible in the provided image. However, there are 3 fibers in this image. Because we did not immuno-stain for myofibers here, we circled one fiber for illustration. For clarity, we brightened the background (of the entire original images) so the background signals from myofiber boundaries are discernable without outlines.

      (5) FigS1B There was no overlapped DAPI staining in the Myogenin+ cell. DAPI-staining should be present in Myogenin+ cells because myogenin is located in the nucleus.

      Fig.S1B is immuno-staining for MyoD , and we marked one MyoD+DAPI+GFP+ cell/nucleus. Fig.S1C is immune-staining for Myogenin, and we also marked one (cell/nucleus) that is triple positive.

      (6) The position of the asterisk for the ScxGFP in FigS1D is misaligned. In addition, the position is not matched with Fig1C. Because all myofibers are Scx-positive, it is strange that only one myofiber has an asterisk. The reviewer suggests removing the mark.

      Thank you for pointing out these errors. We have now corrected the misalignment and removed the unnecessary asterisk.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable experimental and numerical results on the motility of a magnetotactic bacterium living in sedimentary environments, particularly in environments of varying magnetic field strengths. The evidence supporting the claims of the authors is solid, although the statistical significance comparing experiments with the numerical work is weak. The study will be of interest to biophysicists interested in bacterial motility. 

      We thank the reviewers and editors for their careful reading and the constructive comments. With respect to the statement about weak statistical significance, we think that this statement mixes two separate issues, the significance of the difference between experiments at 0 and 50µT and the comparison of experiments with simulations. We have amended our manuscript to address both points as described below. The difference between the experiments at 0 and 50µT is indeed significant, and the discrepancy between experiments and simulations can be explained by unavoidable differences in the way we quantify bacterial throughput.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present experimental and numerical results on the motility Magnetospirillum gryphiswaldense MSR-1, a magnetotactic bacterium living in sedimentary environments. The authors manufactured microfluidic chips containing three-dimensional obstacles of irregular shape, that match the statistical features of the grains observed in the sediment via microcomputer tomography. The bacteria are furthermore subject to an external magnetic field, whose intensity can be varied. The key quantity measured in the experiments is the throughput ratio, defined as the ratio between the number of bacteria that reach the end of the microfluidic channel and the number of bacteria entering it. The main result is that the throughput ratio is non-monotonic and exhibits a maximum at magnetic field strength comparable with Earth's magnetic field. The authors rationalize the throughput suppression at large magnetic fields by quantifying the number of bacteria trapped in corners between grains. 

      Strengths: 

      While magnetotactic bacteria's general motility in bulk has been characterized, we know much less about their dynamics in a realistic setting, such as a disordered porous material. The micro-computer tomography of sediments and their artificial reconstruction in a microfluidic channel is a powerful method that establishes the rigorous methodology of this work. This technique can give access to further characterization of microbial motility. The coupling of experiments and computer simulations lends considerable strength to the claims of the authors, because the model parameters (with one exception) are directly measured in the experiments. 

      Weaknesses: 

      The main weakness of the manuscript pertains to the discussion of the statistical significance of the experimental throughput ratio. Especially when comparing results at zero and 50 micro Tesla. The simulations seem to predict a stronger effect than seen in the experiments. The authors do not address this discrepancy. 

      We thank the reviewer for their positive assessment and the detailed constructive remarks. 

      The increase in bacterial throughput between 0 and 50 µT is indeed more pronounced in the simulations than in the experiments, partly due to the fact that there is considerably more variability in the experimental data. We did two things to address this issue: (1) We performed additional statistical test addressing the difference between the experimental results at 0 and 50 µT. Indeed, the difference is only weakly significant (in contrast to the difference of either to 500µT). The increase is however consistent with the observation in the absence of obstacles in the channel, where we see a monotonous increase from 0 to 500 µT (Supp. Figure S5). We have added the test results in the caption of Fig. 3. (2) To address the difference between simulations and experiments, we added a section in Methods on how we determine the throughput and a short discussion in the Results section. The key points are that the initial condition is different in simulations and experiments and that the throughput is therefore quantified differently. This difference is due to experimental limitations: we cannot track bacteria through the whole channel and we wanted to avoid pushing them into the channel with fluid flow to avoid effects of flow on the results. As a consequence, bacteria continue to enter the IN region of the channel from the inlet during the experiment, while in the simulation, they all start at the beginning of the channel simultaneously. We expect this to mostly affect the case with diffusive transport (B=0).

      Reviewer #2 (Public Review): 

      Summary: 

      simulation study of magnetotactic bacteria in microfluidic channels containing sediment-mimicking obstacles. The obstacles were produced based on micro-computer tomography reconstructions of bacteria-rich sediment samples. The swimming of bacteria through these channels is found experimentally to display the highest throughput for physiological magnetic fields. Computer simulations of active Brownian particles, parameterized based on experimental trajectories are used to quantify the swimming throughput in detail. Similar behavior as in experiments is obtained, but also considerable variability between different channel geometries. Swimming at strong field is impeded by the trapping of bacteria in corners, while at weak fields the direction of motion is almost random. The trapping effect is confirmed in the experiments, as well as the escape of bacteria with reducing field strength. 

      Strengths: 

      This is a very careful and detailed study, which draws its main strength from the fruitful combination of the construction of novel microfluidic devices, their use in motility experiments, and simulations of active Brownian particles adapted to the experiment. Based on their results, the authors hypothesize that magnetotactic bacteria may have evolved to produce magnetic properties that are adapted to the geomagnetic field in order to balance movement and orientation in such crowded environments. They provide strong arguments in favor of such a hypothesis. 

      Weaknesses: 

      Some of the issues touched upon here have been studied also in other articles. It would be good to extend the list of references accordingly and discuss the relation briefly in the text. 

      We thank the reviewer for the constructive comments. We answer to the point concerning previous literature in the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Here follows a list of points the authors should address. 

      (1) Are additional experiments feasible to decrease the statistical noise present in Fig. 3c? At the very least, the authors should discuss the statistical significance of the results at 50 muT vis-a-vis 0 T. 

      See our response to Strengths/Weaknesses above

      (2) The experimental setup is not immediately clear. I think that adding a panel from Fig. S1 (or a sketch thereof) would help clarify, especially in relation to the entry zone and end zone. 

      We are not sure what you mean. Fig. 3A already contains exactly such a panel. We have however added another supplementary figure that shows an additional detailed view of the setup (Fig. S3). In addition, we revised several figures: We have replaced Fig. S1 with a better version and exchanged the schematic view of the obstacle channel in Fig 1, removing the additional inlets that were not used in this study (also in Fig 3A), Instead we added a comment in Methods explaining their presence. Hopefully this makes the setup clear.

      (3) It should be also stated that there is no external flow imposed on the channel. 

      We have added such a statement in the description of the experiment (in section 2.2 Swimming of magnetotactic bacteria through sediment-mimicking obstacle channels.  

      (4) Fig. 3c and Fig. 6c are seemingly showing the same quantity (or closely related ones). The authors should use the same symbol and give an explicit mathematical definition. 

      The two quantities are not exactly the same, as we cannot directly quantify the flux of bacteria through the channel in our experiments. On the one hand, we cannot track bacteria through the whole channel, on the other hand, the initial conditions are not exactly the same as in the simulations. In the simulations all bacteria start at the same time at the entrance to the channel. In the experiments, they enter from the inlet and do so at different times (pushing them in with fluid flow would be possible, but carries the risk of perturbing the results due to induced flow through the channel). We have added a new section in the Methods section that explains this difference and describes the procedure used to obtain the throughput from the experiments in detail. We have also added a corresponding comment in the Result section, where the simulations are compared with the experiments. 

      Minor issues: 

      - Figures have different styles that should be unified. For example, the panel labels sometimes have round brackets and sometimes they don't.

      See above

      - Page 6, (muCT) should have the Greek letter mu 

      Thanks, corrected.

      - Fig. 3a is not very clear; see my point 2 above. 

      See above

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments and questions, which the authors should address: 

      (1) The observed exponential dependence of decay time on the "well" depth could be related to the exponential density distribution of active particles in a gravitational field, which has been derived previously. Might be interesting to discuss such a possible connection. 

      Thank you for the suggestion, the two cases are indeed somewhat analogous with behaviors reminiscent of thermal processes with an effective temperature. Such a description is however not generally possible (even for sedimentation, only some features are described). We plan to address in future work whether it can be made more quantitative in our case of escape from the corner traps. We have included a short discussion of the analogy in the section on trapping and escape. 

      (2) The authors should consider the following relevant references, and discuss them briefly in their manuscript:

      - Sedimentation, trapping, and rectification of dilute bacteria J Tailleur, ME Cates EPL 86, 60002 (2009) 

      - Human spermatozoa migration in microchannels reveals boundary-following navigation P Denissenko, V Kantsler, DJ Smith, J Kirkman-Brown Proc. Natl. Acad. Sci. USA 109, 8007-8010 (2012) 

      - Wall accumulation of self-propelled spheres J Elgeti, G Gompper Europhysics Letters 101, 48003 (2013) 

      - Wall entrapment of peritrichous bacteria: a mesoscale hydrodynamics simulation study SM Mousavi, G Gompper, RG Winkler Son Maber 16 (20), 4866-4875 (2020) 

      - A Geometric Criterion for the Optimal Spreading of Active Polymers in Porous Media C Kurzthaler, S Mandal, T Bhabacharjee, H Löwen, SS Daba, HA Stone Nat. Commun. 12, 7088 (2021) 

      - Run-to-Tumble Variability Controls the Surface Residence Times of E. coli Bacteria G Junot, T Darnige, A Lindner, VA Martinez, J Arlt, A Dawson, WCK Poon, H Auradou, E Clement Phys. Rev. Leb. 128, 248101 (2022) 

      - Dynamics and phase separation of active Brownian particles on curved surfaces and in porous media P Iyer, RG Winkler, DA Fedosov, G Gompper Phys. Rev. Research 5, 033054 (2023) 

      We agree that there is a lot of literature on these aspects, specifically interaction of self-propelled objects with walls and motion of swimmers through porous media. We have slightly extended our overview of previous literature in the introduction and included most of these references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      (1) Their results with human macrophages suggest that there are differences between murine and human macrophages in inflammasome-mediated restriction of STm growth. For example, Thurston et al. showed that in murine macrophages that inflammasome activation controls the replication of mutant STm that aberrantly invades the cytosol, but only slightly limits replication of WT STm. In contrast, here the authors found that primed human macrophages rely on caspase-1, gasdermin D and ninjurin-1 to restrict WT STm. I wonder if the priming of the human macrophages in this study could account for the differences in these studies. Along those lines, do the authors see the same results presented in this study in the absence of priming the macrophages with Pam3CSK4. I think that determining whether the control of intracellular STm replication is dependent on priming is very important.

      We thank the Reviewer for their careful attention to our manuscript and for their thoughtful comments. We have addressed this question about the impact of priming by repeating the bacterial intracellular burden assays in unprimed WT and CASP1-/- THP-1 cells. We have added additional figures to the manuscript to address this: Figure 1 – Figure Supplement 3. Under unprimed conditions, CASP1-/- cells still harbored significantly higher bacterial burdens at 6 hpi and a significant fold-increase in bacterial CFUs compared to WT cells. These results suggest that the caspase-1-mediated restriction of intracellular Salmonella replication in human macrophages is independent of priming. 

      (2) Another difference with the Thurston et al. paper is the way that the STm inoculum was prepared - stationary phase bacteria that were opsonized. Could this also account for differences between the two studies rather than differences between murine and human macrophages in inflammasome-dependent control of STm?

      We thank the Reviewer for this excellent suggestion. To address this possibility, we repeated the bacterial intracellular burden assays in WT and CASP1-/- THP-1 cells using stationary phase bacteria. We infected WT and CASP1-/- THP-1 cells with stationary phase Salmonella, and we subsequently assayed for intracellular bacterial burdens. These data have now been added to the manuscript in Figure 1 – Figure Supplement 4. Interestingly, we did not observe any fold-change in the bacterial colony forming units in both the WT and CASP1-/- THP-1 cells for the stationary phase Salmonella. These data indicate that by 6 hours postinfection, Salmonella do not replicate efficiently in human macrophages unless grown under SPI-1-inducing conditions. Furthermore, these results suggest that differences in how the Salmonella inoculum is prepared may contribute to the discrepancies between our study and previous studies, as noted by the Reviewer. 

      (3) The authors show that the pore-forming proteins GSDMD and Ninj1 contribute to control of STm replication in human macrophages. Is it possible that leakage of gentamicin from the media contributes to this control?

      Response: We thank the Reviewer for their insightful comment. We have addressed this question on the impact of gentamicin by repeating the bacterial intracellular burden assays using a lower concentration of gentamicin in combination with extensively washing the cells with RPMI media to remove the gentamicin. WT and CASP1-/- THP-1 cells were infected with WT Salmonella. Then, at 30 minutes post-infection, cells were treated with 25 μg/ml of gentamicin to kill any extracellular bacteria. At 1 hour post-infection (hpi), the cells were washed for a total of five times with fresh RPMI to remove the gentamicin, and then the media was replaced with fresh media containing no gentamicin. In parallel, we also treated cells with 100 μg/ml of gentamicin at 30 minutes post-infection, washed the cells five times with fresh RPMI at 1 hpi to remove the gentamicin, and then replaced the media with fresh media containing 10 μg/ml of gentamicin. This data has now been included in the manuscript as Figure 1 – Figure Supplement 5. We observed similar levels in the intracellular bacterial burdens at 1 hpi and 6 hpi and a fold-increase in bacterial colony forming units in CASP1-/- cells compared to WT cells across both gentamicin conditions, suggesting that gentamicin appears to not contribute to the intracellular control of Salmonella replication in human macrophages. Of note, we also tried repeating the bacterial intracellular burden assays without gentamicin, using only washes to remove extracellular at 1 hpi; however, under these experimental conditions, we observed high levels of extracellular Salmonella. Therefore, we relied on using a lower concentration of gentamicin to kill extracellular Salmonella in conjunction with extensive washing to remove the gentamicin for the remainder of the infection. 

      (4) One major question that remains to be answered is whether casp-1 plays a direct role in the intracellular localization of STm. If the authors quantify the percentage of vacuolar vs. cytosolic bacteria at early time points in WT and casp-1 KO macrophages, would that be the same in the presence and absence of casp-1? If so, then this would suggest that there is a basal level of bacterial-dependent lysis of the SCV and in WT macrophages the presence of cytosolic PAMPS trigger cell death and bacteria can't replicate in the cytosol. However, in the inflammasome KO macrophages, the host cell remains alive and bacteria can replicate in the cytosol.

      We thank this Reviewer for raising this important point. We have addressed this experimentally by quantifying the percentage of vacuolar vs. cytosolic Salmonella at 2 hpi in WT, NAIP-/-, and CASP1-/- THP-1 cells using a chloroquine (CHQ) resistance assay. This data has now been included in the manuscript in the new Figure 5A. The original subfigures of Figure 5 have consequently been rearranged. We did not observe any significant differences in vacuolar and cytosolic bacterial burdens at this early time point in WT, NAIP-/-, and CASP1-/- THP-1 cells. As noted by the Reviewer, these results suggest that the basal level of bacterialdependent lysis of the SCV in human macrophages is not dependent on caspase-1 or NAIP. 

      Reviewer #3: 

      (1) The main weaknesses of the study are the inherent limitations of tissue culture models. For example, to study interaction of Salmonella with host cells in vitro, it is necessary to kill extracellular bacteria using gentamicin. However, since Salmonella-induced macrophage cell death damages the cytosolic membrane, gentamicin can reach intracellular bacteria and contribute to changes in CFU observed in tissue culture models (major point 1). This can result in tissue culture "artefacts" (i.e., observations/conclusions that cannot be recapitulated in vivo). For example, intracellular replication of Salmonella in murine macrophages requires T3SS-2 in vitro, but T3SS-2 is dispensable for replication in macrophages of the spleen in vivo (Grant et al., 2012).  

      We thank the Reviewer for their helpful comments and insightful suggestions. We have addressed some of the concerns about gentamicin in our response to Reviewer #1 above. To address the Reviewer’s concerns further, we have included language to acknowledge the limitations of our study based on the artefacts of tissue culture models in our Discussion section: “In this study, we utilized tissue culture models to examine intracellular Salmonella replication in human macrophages. These in vitro systems allow for precise control of experimental conditions and, therefore, serve as powerful tools to interrogate the molecular mechanisms underlying inflammasome responses and Salmonella replication in both immortalized and primary human cells. Still, there are limitations of tissue culture models, as they lack the inherent complexity of tissues and organs in vivo. To assess whether our findings reflect Salmonella dynamics in the mammalian host, it will be important to complement our studies and extend the implications of our work using approaches that model more complex systems, such as organoids or organ explant models co-cultured with immune cells, and in vivo techniques, such as humanized mouse models.”

      (2) In Figure 1: are increased CFU in WT vs CASP1-deficient THP-1 cells due to Caspase 1 restricting intracellular replication or due to Caspase-1 causing pore formation to allow gentamicin to enter the cytosol thereby restricting bacterial replication? The same question arises about Caspase-4 in Figure 2, where differences in CFU are observed only at 24h when differences in cell death also become apparent. The idea that gentamicin entering the cytosol through pores is responsible for controlling intracellular Salmonella replication is also consistent with the finding that GSDMD-mediated pore formation is required for restricting intracellular Salmonella replication (Figure 3). Similarly, the finding that inflammasome responses primarily control Salmonella replication in the cytosol could be explained by an intact SCV membrane protecting Salmonella from gentamicin (Figure 5). 

      We thank the Reviewer for highlighting this important point regarding gentamicin.

      We have addressed this question in our response above to Review #1 and in Figure 1 – Figure Supplement 5. We observed caspase-1-mediated restriction of Salmonella in human macrophages even when cells were treated with a lower concentration of gentamicin (25 μg/ml) for 30 minutes and then extensively washed with RPMI media to remove any gentamicin for the remainder of the infection. These data suggest that gentamicin is likely not responsible for controlling intracellular Salmonella in human macrophages.

    1. Author response:

      We thank all three reviewers and the editors for their detailed comments on our manuscript.  The two main themes of this feedback concern the paper’s generality and its presentation.  Reviewers #2 and #3 raise questions about how the discrepancies in fitness statistics we report will be realized across organisms, environments, and in models with interactions beyond resource competition (e.g., toxicity or cross-feeding).  All reviewers and the editors have also expressed the need for the presentation to be improved, including a broader introduction to the concept of fitness (Reviewer #1), a clearer explanation of our model (Reviewer #1), better explanations of how quantifying fitness answers key biological questions (Reviewer #3), and improvements to the most technical sections to ensure accessibility to experimentalists (Reviewer #3).

      In light of these comments, we wish to clarify that the goal of this paper is to provide a proof-of-principle for how different choices in quantifying fitness can lead to different analysis outcomes.  Since the focus of this paper is on the theoretical concepts, we focus on a few example data sets and a simple model to demonstrate the existence of these discrepancies.  While other organisms and environments, especially with more complex growth dynamics and interactions, could certainly have additional or different discrepancies in fitness statistics, we believe the simplicity of our approach is valuable because it demonstrates that even basic features of microbial growth (common across systems) with realistic parameter values are sufficient to cause significant differences in fitness depending on these quantification choices.  We agree with the reviewers that a systematic documentation of how these fitness discrepancies are empirically realized is important, but we believe that question is best explored in separate future works that can focus fully on this empirical rather than theoretical question.

      We plan to revise the manuscript in several ways, following the suggestions of the three reviewers and the editor.  First, we will better articulate the main goal and conclusions of this manuscript, especially its generality and limitations.  Second, we will work to streamline and clarify several points in the main text identified by the reviewers to make it more accessible and useful to a broader audience, especially experimentalists who routinely measure fitness in their work.  We are grateful to the reviewers and the editor for their time and effort in assessing the manuscript, and we look forward to providing an updated version that addresses these concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported. 

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that co-localize with VC cells

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineagetracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the purpose of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307) however, we stained injured VNCs with anti-Ph3 and found ALG cells positive for PH3 (Author response image 1).

      Author response image 1.

       

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We added a Figure 1 V2 (version 2) with the suggested images (1-C’).

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We clarified these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.  

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that colocalize with VC cells

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We have carefully checked all figures and labelled then as “Injured” or “Not Injured”. We added a Figure 2-V2 and a figure 4-V2.

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of

      glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      Reviewer #1 (Public review):

      Li et al. investigate Ca2+ signaling in T. gondii and argue that Ca2+ tunnels through the ER to other organelles to fuel multiple aspects of T. gondii biology. They focus in particular on TgSERCA as the presumed primary mechanism for ER Ca2+ filling. Although, when TgSERCA was knocked out there was still a Ca2+ release in response to TG present.

      Note that we did not knockout SERCA as it is an essential gene so it would not be possible to isolate parasites that do not express SERCA. We created conditional mutants that downregulate the expression of SERCA and some activity is present in the mutant after 24 h of ATc treatment.

      Overall the Ca2+ signaling data do not support the conclusion of Ca2+ tunneling through the ER to other organelles in fact they argue for direct Ca2+ uptake from the cytosol.

      The authors show EM membrane contact sites between the ER and other organelles, so Ca2+ released by the ER could presumably be taken up by other organelles but that is not ER Ca2+ tunneling.

      They clearly show that SERCA is required for T. gondii function.

      Overall, the data presented to not fully support the conclusions reached

      We agree that the data does not support Ca2+ tunneling as defined and characterized in mammalian cells. In response to this comment, we modified the title and the text accordingly.

      However, we think that the study shows far more than just the role of SERCA in T. gondii functions. We argue that the study shows that the ER (through the activity of the SERCA pump) sequesters and re-distributes calcium to other organelles following influx through the PM. The experiments show that the ER is able to take calcium from the cytosol as it enters the parasite through SERCA activity, and this activity is important for the transition of the parasite between various extracellular calcium exposures. We believe that the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium is demonstrated in the experiments shown in Figs 1H-I, 4G-H and 5G,H, I, J, K . There are no previous T. gondii studies that address the question of how intracellular stores are filled with calcium, which are essential for the continuation of the lytic cycle, meaning they are essential for the parasitism of T. gondii.

      Data argue for direct Ca2+ uptake from the cytosol

      The ER most likely takes up calcium from the cytosol following its entry through the PM and redistributes it to the other organelles. We will delete the word “tunneling” and replace it with transfer and re-distribution as they represent our results.

      What we think is re-distribution is shown in Figure 1H and I in which the calcium released after GPN and nigericin are enhanced after TG addition. Of note is that there is no experimental evidence that supports the regulation of calcium entry by store depletion (PMID: 24867952), and we do not think that the enhanced response is due to calcium entry.

      Figure 4G and H show that knocking down SERCA reduces significantly the response to GPN. Fig 5I shows that the mitochondrial calcium uptake is reduced after the addition of GPN in the knockdown mutant. Fig 2B shows that SERCA can take up calcium at 55 nM calcium while mitochondrial uptake needs higher concentrations (Fig 5B-C). However, higher calcium concentrations could be reached at the microdomains formed around MCS between the ER and mitochondrion. Figure 5E shows that the mitochondrion is not responsive to an increase of cytosolic calcium. This is also shown for the apicoplast in Fig. 7 E and F of the Li et al, Nat Commun 2021 paper.

      Reviewer #2 (Public review):

      The role of the endoplasmic reticulum (ER) calcium pump TgSERCA in sequestering and redistributing calcium to other intracellular organelles following influx at the plasma membrane.

      T. gondii transitions through life cycle stages within and exterior to the host cells, with very different exposures to calcium, adds significance to the current investigation of the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium.

      They also use a conditional knockout of TgSERCA to investigate its role in ER calcium store-filling and the ability of other subcellular organelles to sequester and release calcium. These knockout experiments provide important evidence that ER calcium uptake plays a significant role in maintaining the filling state of other intracellular compartments.

      We thank the reviewer.

      While it is clearly demonstrated, and not surprising, that the addition of 1.8 mM extracellular CaCl2 to intact T. gondii parasites preincubated with EGTA leads to an increase in cytosolic calcium and subsequent enhanced loading of the ER and other intracellular compartments, there is a caveat to the quantitation of these increases in calcium loading. The authors rely on the amplitude of cytosolic free calcium increases in response to thapsigargin, GPN, nigericin, and CCCP, all measured with fura2. This likely overestimates the changes in calcium pool sizes because the buffering of free calcium in the cytosol is nonlinear, and fura2 (with a Kd of 100-200 nM) is a substantial, if not predominant, cytosolic calcium buffer. Indeed, the increases in signal noise at higher cytosolic calcium levels (e.g. peak calcium in Figure 1C) are indicative of fura2 ratio calculations approaching saturation of the indicator dye.

      We agree about the limitations of using Fura2 but according to the literature (PMID:3838314, fig. 3) Fura2 is suitable for measurements between 100 nM and 1 mM calcium.  The responses in our experiments were within its linear range and the experiments with the SERCA mutant and mitochondrial GCaMPs supports the conclusions of our work.

      We agree that the experiment shown in Fig 1C shows a response close to the limit of the linear range of Fura2 and we can provide a more representative trace in the final article. We can include new quantifications and comparisons.

      Another caveat, not addressed, is that loading of fura2/AM can result in compartmentalized fura2, which might modify free calcium levels and calcium storage capacity in intracellular organelles.

      We are aware of this issue and because of that we have modified our protocol to minimize compartmentalization. We load cells for 26 min at room temperature and keep cells in ice and do not use them for longer that 2-3 hours because we do see evidence of compartmentalization. One evidence of compartmentalization is the increase in the resting calcium concentration.

      The finding that the SERCA inhibitor cyclopiazonic acid (CPA) only mobilizes a fraction of the thapsigargin-sensitive calcium stores in T. gondii coincides with previously published work in another apicomplexan parasite, P. falciparum, showing that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools (Borges-Pereira et al., 2020, DOI: 10.1074/jbc.RA120.014906). It would be valuable to determine whether this reflects the off-target effects of thapsigargin or the differential sensitivity of TgSERCA to the two inhibitors.

      This is an interesting observation, and we will discuss the result considering the Plasmodium study and include the citation. We will add inhibition curves using the MagFluo protocol and compare CPA and TG.

      Figure S1 suggests differential sensitivity, and it shows that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools in T. gondii. Also important is that we used 1 µM TG as we are aware that TG has shown off-target effects at higher concentrations. 

      The authors interpret the residual calcium mobilization response to Zaprinast observed after ATc knockdown of TgSERCA (Figures 4E, 4F) as indicative of a target calcium pool in addition to the ER. While this may well be correct, it appears from the description of this experiment that it was carried out using the same conditions as Figure 4A where TgSERCA activity was only reduced by about 50%.

      We partially agree as pointed by the reviewer knock down of TgSERCA by only 50% means that the ER still could be targeted by zaprinast and no evidence of another target calcium pool. From the MagFLuo4 experiment (although we are aware that the fluorescence of mag Fluo4 is not linear to calcium), there is SERCA activity after 24 hr of ATc treatment.  However, when adding Zaprinast after TG we see a significant release of calcium which is true for both wild type and conditional knockdowns. Because of this result we proposed that there could be another large neutral calcium pool than the one mobilized by TG. We will address these possibilities in the discussion and interpretation of the result.

      The data in Figures 4A vs 4G and Figures 4B vs 4H indicate that the size of the response to GPN is similar to that with thapsigargin in both the presence and absence of extracellular calcium. This raises the question of whether GPN is only releasing calcium from acidic compartments or whether it acts on the ER calcium stores, as previously suggested by Atakpa et al. 2019 DOI: 10.1242/jcs.223883. Nonetheless, Figure 1H shows that there is a robust calcium response to GPN after the addition of thapsigargin.

      The results of the experiments did not exclude the possibility that GPN can also mobilize some calcium from the ER besides acidic organelles. We don’t have any evidence to support that GPN can mobilize calcium from the ER either. Based on our unpublished work, we think GPN mainly release calcium from the PLVAC. We will include the mentioned citation and discuss the result considering the possibility that GPN may be acting on the ER.

      An important advance in the current work is the use of state-of-the-art approaches with targeted genetically encoded calcium indicators (GECIs) to monitor calcium in important subcellular compartments. The authors have previously done this with the apicoplast, but now add the mitochondria to their repertoire. Despite the absence of a canonical mitochondrial calcium uniporter (MCU) in the Toxoplasma genome, the authors demonstrate the ability of T. gondii mitochondrial to accumulate calcium, albeit at high calcium concentrations. Although the calcium concentrations here are higher than needed for mammalian mitochondrial calcium uptake, there too calcium uptake requires calcium levels higher than those typically attained in the bulk cytosolic compartment. And just like in mammalian mitochondria, the current work shows that ER calcium release can elicit mitochondrial calcium loading even when other sources of elevated cytosolic calcium are ineffective, suggesting a role for ER-mitochondrial membrane contact sites. With these new tools in hand, it will be of great value to elucidate the bioenergetics and transport pathways associated with mitochondrial calcium accumulation in T. gondii.

      We thank this reviewer for his/her positive comment. Studies of bioenergetics and transport pathways associated with mitochondrial calcium accumulation is part of our future plans.

      The current studies of calcium pools and their interactions with the ER and dependence on SERCA activity in T. gondi are complemented by super-resolution microscopy and electron microscopy that do indeed demonstrate the presence of close appositions between the ER and other organelles (see also videos). Thus, the work presented provides good evidence for the ER acting as the orchestrating organelle delivering calcium to other subcellular compartments through contact sites in T. gondi, as has become increasingly clear from work in other organisms.

      Thank you

      Reviewer #3 (Public review):

      This manuscript describes an investigation of how intracellular calcium stores are regulated and provides evidence that is in line with the role of the SERCA-Ca2+-ATPase in this important homeostasis pathway. Calcium uptake by mitochondria is further investigated and the authors suggest that ER-mitochondria membrane contact sites may be involved in mediating this, as demonstrated in other organisms.

      The significance of the findings is in shedding light on key elements within the mechanism of calcium storage and regulation/homeostasis in the medically important parasite Toxoplasma gondii whose ability to infect and cause disease critically relies on calcium signalling. An important strength is that despite its importance, calcium homeostasis in Toxoplasma is understudied and not well understood.

      We agree with the reviewer. Thank you

      A difficulty in the field, and a weakness of the work, is that following calcium in the cell is technically challenging and thus requires reliance on artificial conditions. In this context, the main weakness of the manuscript is the extrapolation of data. The language used could be more careful, especially considering that the way to measure the ER calcium is highly artificial - for example utilising permeabilization and over-loading the experiment with calcium. Measures are also indirect - for example, when the response to ionomycin treatment was not fully in line with the suggested model the authors hypothesise that the result is likely affected by other storage, but there is no direct support for that.

      The MagFluo protocol has been amply used in mammalian cells, DT40 cells and other cells for the characterization of the IP3 receptor response to IP3. We will include and discuss more citations in the revised article. The scheme at the top of the figure shows the protocol used. There is no overloading with calcium because the cells are permeabilized and the concentrations of calcium used are physiological and all experiments were performed at 220 nm calcium which is within the cytosolic levels tolerated by cells. The experiment was done with permeabilized cells because permeabilization allows the indicator to become diluted, the substrate MgATP to reach the membrane of the ER and in addition allows for the exposure to precise concentrations of calcium. MagFluo4 loading is intended for its compartmentalization to all intracellular compartments and the uptake stimulated by MgATP exclusively occurs in the compartment occupied by SERCA. IO is an ionophore that causes calcium release from other stores in addition to the ER and it is expected that will result in a larger release. We must clarify that the experiment shown in Fig. 2 was done to characterize the activity of SERCA and was not aimed at the characterization of the role of SERCA in the parasite. We will explain this result better in the revised version of the article.

      Below we provide some suggestions to improve controls, however, even with those included, we would still be in favour of revising the language and trying to avoid making strong and definitive conclusions. For example, in the discussion perhaps replace "showed" with "provide evidence that are consistent with..."; replace or remove words like "efficiently" and "impressive"; revise the definitive language used in the last few lines of the abstract (lines 13-17); etc. Importantly we recommend reconsidering whether the data is sufficiently direct and unambiguous to justify the model proposed in Figure 7 (we are in favour of removing this figure at this early point of our understanding of the calcium dynamic between organelles in Toxoplasma).

      We thank the reviewer for the suggestions and will modify the language as suggested.

      Fig 7 is only a model and as all models could be incorrect. However, considering this reviewer’s criticism we will replace the model for a simpler one that is less speculative.

      Another important weakness is poor referencing of previous work in the field. Lines 248-250 read almost as if the authors originally hypothesised the idea that calcium is shuttled between ER and mitochondria via membrane contact sites (MCS) - but there is extensive literature on other eukaryotes which should be first cited and discussed in this context. Likewise, the discussion of MCS in Toxoplasma does not include the body of work already published on this parasite by several groups. It is informative to discuss observations in light of what is already known.

      We added a citation following the sentence mentioned by the reviewer in lines 248-250 (corrected preprint) and will include more in the revised article. We cite several pertinent articles that describe MCS in Toxoplasma (lines 378-380, very few actually). We will make sure not to miss any new articles that could have been recently published. Note that our work is not about describing the presence of MCSs. We are showing transfer of calcium between the ER and mitochondria and we present evidence that supports that it happens through MCSs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      - Summary: 

      Recordings were made from the dentate nucleus of two monkeys during a decision-making task. Correlates of stimulus position and stimulus information were found to varying degrees in the neuronal activities. 

      We agree with this summary.

      - Strengths: 

      A difficult decision-making task was examined in two monkeys.

      We agree with this statement.

      - Weaknesses: 

      One of the monkeys did not fully learn the task. The manuscript lacked a coherent hypothesis to be tested, and no attempt was made to consider the possibility that this part of the brain may have little to do with the task that was being studied. 

      We understand the reviewers concern. It is correct that one of the monkeys (Mi) did not perform at a high level, but it should be noted that both monkeys learned significantly above chance level. Therefore, we would argue that both monkeys in fact did learn the task but Mi’s performance was suboptimal. This difference in the performance levels gave us a rare opportunity to dive deeper into the reasons why some animals perform better than the others and we show that Mi (the lower performing monkey) paid more attention to the outcome of the previous trial – this is evident from our behavioural and decoding models.

      We tested the overall hypothesis that neurons of the nucleus dentate can dynamically modulate their activity during a visual attention task, comprising not only sensorimotor but also cognitive attentional components. Many neurons in the dentate are multimodal (Figure 3C-D) which was something that was theorized. One of the specific hypotheses that we tested is that the dentate cells can be direction-selective for both the sensorimotor and cognitive component. Given that many of the recorded cells showed direction-selectivity in their firing rate modulation for gap directions and/or stimulus directions, we provide strong evidence that this hypothesis is correct. We have now spelled out this hypothesis more explicitly in the introduction of the revised version. We now also explain better why we tested this specific hypothesis. Indeed, earlier studies in primates such as those by Herzfeld and colleagues (2018, Nat. Neuro.) and van Es and colleagues (2019, Current Biol) have indicated that direction-selectivity of cerebellar activity may occur in various sensorimotor domains.

      We also appreciate the comment of this Reviewer that in our original submission we did not show our attempt to consider the possibility that this part of the brain may have little to do with the task that was being studied. We in fact did consider this possibility in that we successfully injected 3 ml of muscimol (5 μg/ml, Sigma Aldrich) into the dentate nucleus in vivo in one of the monkeys (Mo). This application resulted in a reduction of more than 10% in correct responses of the covert attention task after 45 minutes, whereas the performance remained the same following saline injections. Unfortunately, due to the timing of the experiments and Covid19-related laboratory restrictions we were unable to perform these experiments in the other monkey or repeat them in Mo. We aim to replicate this in future experiments and publish it when we have full datasets of at least two monkeys available. For this paper we have prioritized our tracing experiments, highlighting the connections of the dentate nucleus with attention related areas in brainstem and cortex in both monkeys, following perfusion.

      - Perhaps the large differences in performance between the two subjects can be used as a way to interpret the neural data's relationship to behavior, as it provided a source of variance. This is what we would hypothesize if we believed that this area of the brain is playing a significant role in the task. If one animal learns much more poorly, and this region of the brain is important for that behavior, then shouldn't there be clear, interpretable differences in the neural data? 

      We thank the Reviewer for this comment. We have added a new Supplementary Figure 2, in which we present the data for both monkeys separately in the revised manuscript. Comparing the two datasets however, we see more commonalities related to the significant learning in both monkeys than differences that might be related to their different levels of learning. We have therefore decided to show the different datasets transparently in the new Supplementary Figure 2, but to stay on the conservative side in our interpretations.

      - How should we look for these differences? A number of recent papers in mice have uncovered a large body of data showing that during the deliberation period, when the animal is interpreting a sensory stimulus (often using the whisker system), there is ramping activity in a principal component space among neurons that contribute to the decision. This ramping activity is present (in the PCA space) in the motor areas of the cortex, as well as in the medial and lateral cerebellar nuclei. Perhaps a similar computational approach would benefit the current manuscript. 

      We also appreciate this point. We have done the principal component analysis accordingly, and we indeed do find the ramping activity in several components of the dentate activity of both monkeys (Mi and Mo). We have now added a new Supplementary Figure 3 with the first three components of both correct and incorrect trials for Mi and Mo, highlighting their potential contribution.

      - What is the hypothesis that is being tested? That is, what do you think might be the function of this region of the cerebellum in this task? It seems to me that we are not entirely in the dark, as previous literature on mice decision-making tasks has produced a reasonable framework: the deliberation period coincides with ramping activity in many regions of the frontal lobe and the cerebellum. Indeed, the ramp in the cerebellum appears to be a necessary condition for the ramp to be present in the frontal lobe. Thus, we should see such ramping activity in this task in the dentate. When the monkey makes the wrong choice, the ramp should predict it. If you don't see the ramping activity, then it is possible that the hypothesis is wrong, or that you are not recording from the right place. 

      It is indeed one of our specific hypotheses that the dentate cells can be direction-selective for the preparing cognitive component and/or sensorimotor response. We provide evidence that this hypothesis may be correct when we analyze the regular time response curves (see Figure 2 and the new Supplementary Figure 2 where the data of both monkeys are now presented separately). Moreover, we have now verified this by analysing the ramping curves of PCA space (new Supplementary Figure 3) and firing frequency of DN neurons that modulated upon presentation of the C-stimulus (new Supplementary Figure 4). These figures and findings are now referred to in the main text.

      - As this is a difficult task that depends on the ability of the animals to understand the meaning of the cues, it is quite concerning that one of the monkeys performed poorly, particularly in the early sessions. Notably, the disparity between the two subjects is rather large: one monkey at the start of the recordings achieved a performance that was much better than the second monkey did at the end of the recording sessions. You highlighted the differences in performance in Figure 1D and mentioned that you started recording once the animals reached 60% performance. However, this did not make sense to me as the performance of Mi even after the final day of recording did not reach the performance of Mo on the first day of recording. Thus, in contrast to Mo, Mi appeared to be not ready for the task when the recording began.

      We understand this point. However, please note that the learning performance of the monkeys concerned retraining sessions after they had had several weeks of vacation. So, even though it is correct that one of the two monkeys had a very good consolidation and started already at a relatively high level on the first retraining session, the other one also started and ended at a level above chance level (the y-axis starts at 0.5). We now highlight this point better in the Results section.

      - One objective of having two monkeys is to illustrate that what is true in one animal is also true in the other. In some figures, you show that the neural data are significantly different, while in others you combine them into one. Thus, are you confident that the neural data across the animals should be combined, as you have done in Figure 2? Perhaps you can use the large differences in performance as a source of variance to find meaning in the neural data. 

      This is a valid question; as highlighted above, we have now addressed this point in the new Supplementary Figure 2, where the data for both monkeys are presented separately. Given the sample sizes and level of variances, it is in general difficult to draw conclusions about the potential differences and contributions, but the data are sufficiently transparent to observe common trends. With regard to linking differences in the neural data to the differences in performance level, please also consider Figure 4, the new Supplementary Figure 3 (with the ramping PCA component) and new Supplementary Figure 4 (with the additional analysis of the ramping activity of DN neurons that modulated upon presentation of the C-stimulus), which suggests that the ramping stage of Mo starts before that of Mi. This difference highlights the possibility that injecting accelerations of the simple spike modulations of Purkinje cells in the cerebellar hemispheres into the complex of cerebellar nuclei may be instrumental in improving the performance of responses to covert attention, akin to what has been shown for the impact of Purkinje cells of the vestibulocerebellum on eye movement responses to vestibular stimulation (De Zeeuw et al. 1995, J Neurophysiol). This possibility is now also raised in the Discussion.

      - How do we know that these neurons, or even this region of the brain, contribute to this task? When a new task is introduced, the contributions of the region of the brain that is being studied are usually established via some form of manipulation. This question is particularly relevant here because the two subjects differed markedly in their performance, yet in Figure 3 you find that a similar percentage of neurons are responding to the various elements of the task.

      We appreciate this question. As highlighted above, we are refraining from showing our muscimol manipulation (3 ml of 5 μg/ml muscimol, Sigma Aldrich), as it only concerns 1 successful dataset and 1 control experiment. We hope to replicate this reversible lesion experiment in the future and publish it when we have full new datasets of at least two monkeys available. As explained above, for this paper we have sacrificed both monkeys following a timed perfusion, so as to have similar survival times for the transport of the neuro-anatomical tracer involved.  

      - Behavior in both animals was better when the gap direction was up/down vs. left/right. Is this difference in behavior encoded during the time that the animal is making a decision? Are the dentate neurons better at differentiating the direction of the cue when the gap direction is up/right vs. left/right? 

      These data have now been included in the new Supplementary Figure 2; we did not observe any significant differences in this respect.

      Reviewer #2:

      - The authors trained monkeys to discriminate peripheral visual cues and associate them with planning future saccades of an indicated direction. At the same time, the authors recorded single-unit neural activity in the cerebellar dentate nucleus. They demonstrated that substantial fractions of DN cells exhibited sustained modulation of spike rates spanning task epochs and carrying information about stimulus, response, and trial outcome. Finally, tracer injections demonstrated this region of the DN projects to a large number of targets including several known to interconnect the visual attention network. The data compellingly demonstrate the authors' central claims, and the analyses are well-suited to support the conclusions. Importantly, the study demonstrates that DN cells convey many motor and nonmotor variables related to task execution, event sequencing, visual attention, and arguably decision-making/working memory. 

      We thank the Reviewer for this positive and constructive feedback.

      - The study is solid and I do not have major concerns, but only points for possible improvement. 

      We thank the Reviewer for this positive feedback.

      - A key feature of this data is the extended changes/ramps in DN output across epochs (Figure 2). Crudely, this presents a challenge for the view that DN output mainly drives motor effectors, as the saccade itself lasts only a tiny fraction of the overall task. Some discussion of this dichotomy in thinking about the function(s) of the cerebellum, vis a vis the multifarious DN targets the authors demonstrate here, etc., would be helpful. 

      We agree with the Reviewer and we have expanded our Discussion on this point, also now highlighting the outcome of the new PCA analysis recommended by Reviewer 1 (see the new Supplementary figure Figure 3).

      - A high-level suggestion on the data: the presentation of the data focuses (sensibly) on the representation of the stimulus and response epochs (Figures 2-3). Yet, the authors then show that from decoding, it is, in fact, a trial outcome that is best represented in the population (Figure 4). While there is nothing 'wrong' with this, it reads slightly incongruously, and the reader does a bit of a "double take" back to the previous figures to see if they missed examples of the trial-outcome signals, but the previous presentations only show correct trials. Consider adding somewhere in the first 3 main figures some neural data showing comparisons with incorrect trials. This way, the reader develops prior expectations for the outcome decoding result and frame of reference for interpreting it. On a related note, the text contains an earlier introduction of this issue (p24 last sentence) and p25 paragraph 1 cites Figure 3D and 3E for signals "related to the absence of reward" - but the caption says this includes only correct trials? 

      We thank the Reviewer for bringing up these points. We have addressed the textual suggestions. Moreover, we have done the PCA analysis suggested by Reviewer 1 for both the correct and incorrect trials (see Supplementary material).

      - P29: The discrepancy in retrograde labeling between monkeys (2 orders of magnitude): I realize the authors can't really do anything about this, but the difference is large enough to warrant concerns in the interpretation (how did the tracer spread over the drastically larger area? Isotropically? Could it cross more "hard boundaries" and incorporate qualitatively different inputs/outputs?). A small discussion of possible caveats in interpreting the outcomes would be helpful. 

      We fully agree with this comment. As highlighted in the text, in both monkeys we first identified the optimal points for injection in the dentate nucleus electrophysiologically and we used the same pump with the same settings to carry out the injections, but even so the differences are substantial. We suspect that the larger injection might have been caused by an air bubble trapped in the syringe or a deviation in the stock solution, but we can never be sure of that. We have added a potential explanation for the caveat that might have played a role.

      - And a list of quick points: 

      We have addressed all points listed below; we want to thank the Reviewer for bringing them up.

      P3 paragraph 2 needs comma "in daily life,". 

      P4 paragraph 2 "C-gap" terminology not previously defined. 

      P4 paragraph 2 "animals employed different behavioral strategies". Grammatically, you should probably say "each animal employed a different behavioral strategy," but also scientifically the paragraph doesn't connect this claim to anything about the DN (whereas, e.g., the abstract does make this connection clear). 

      P5 paragraph 1 "theca" should be "the". 

      P6 paragraph 1 problem with ignashenkova citation insert. 

      P10 paragraph 1 I think the spike rate "difference between highest and lowest" is not exactly the same as "variance," you might want to change the terminology. 

      P10 paragraph 1 should probably say "To determine if a cell preferentially modulated". 

      P10 paragraph 1 last sentence the last clause could be clearer. 

      P17 paragraph 2 should be something like "as well as those by Carpenter and..."? 

      P20 caption: consider "...directionality in the task: only one C-stim...". 

      P20 caption: consider "to the left and right in the [L/R] task...to the top/bottom in the [U/D] task". 

      Fig1E and S1 - is there a physical meaning of the "weight" unit, and if none, can this be transformed into a more meaningful unit? 

      P21 paragraph 1 consider "activity was recorded for 304 DN neurons...". 

      P21 paragraph 1 "correlations with the temporal windows" it's not clear how activity can "correlate" with a time window, consider rephrasing (activity levels changed during these time epochs, depending on stimulus identity). 

      P21 paragraph 1 should be "by comparing the number of spikes in a bin...". 

      P22 paragraph 2 "when we aligned the neurons to the time of maximum change" needs clarification. The maximum change of what? And per neuron? Across the population? 

      P22 paragraph 2 "than that of the facilitating" should be "than did the facilitating units". 

      P24 paragraph 1 needs a comma and rewording "Within each direction, trials are sorted by the time of saccade onset". 

      P24 paragraph 1 should probably say "Same as in G, but for suppressed cells". 

      P24 paragraph 2 should say "more than one task event" not "events". 

      P24 paragraph 2 needs a comma "To fully characterize the neural responses, we fitted". 

      P25 paragraph 1 should probably say "we sampled from similar populations of DN". 

      P34 paragraph 3 consider rephrasing the sentence that contains both "dissociation" and "dissociate". 

      P37 last line: consider "coordination of cerebellum and cerebral cortex *in* higher order mental..."? 

      P38 paragraph 1 citation needed for "kinematics of goal-directed hand actions of others"? 

      P38 paragraph 1 commas probably not needed "map visual input, from high-level visual regions, onto..." 

      References

      - Herzfeld D.J., Kojima Y, Soetedjo R, Shadmehr R (2018) Encoding of error and learning to correct that error by the Purkinje cells of the cerebellum. Nat Neurosci 21:736–743.

      - van Es, D.M., van der Zwaag W., and Knapen T. (2019) Topographic Maps of Visual Space in the Human Cerebellum. Current Biol Volume 29, Issue 10p1689-1694.e3May 20.

      - De Zeeuw CI, Wylie DR, Stahl JS, Simpson JI. (1995) Phase relations of Purkinje cells in the rabbit flocculus during compensatory eye movements. J Neurophysiol. Nov;74(5):2051-64. doi: 10.1152/jn.1995.74.5.2051.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors bring together implanted radiofrequency coils, high-field MRI imaging, awake animal imaging, and sensory stimulation methods in a technological demonstration. The results are very detailed descriptions of the sensory systems under investigation.

      Strengths:

      - The maps are qualitatively excellent for rodent whole-brain imaging. - The design of the holder and the coil is pretty clever.

      Weaknesses:

      - Some unexpected regions appear on the whole brain maps, and the discussion of these regions is succinct.

      - The authors do not make the work and e ort to train the animals and average the data from several hundred trials apparent enough. This is important for any reader who would like to consider implementing this technology.

      - The data is not available. This does not let the readers make their own assessment of the results.

      Thank you for the comments on this manuscript. We have provided more detailed discussion of the unexpected regions(page 18 – line 491-494) and training procedures(page7-9 – line 172-236). We also uploaded the datasets to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts:  (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Hike et al. entitled 'High-resolution awake mouse fMRI at 14 Tesla' describes the implementation of awake mouse BOLD-fMRI at high field. This work is timely as the field of mouse fMRI is working toward collecting high-quality data from awake animals. Imaging awake subjects o ers opportunities to study brain function that are otherwise not possible under the more common anesthetized conditions. Not to mention the confounding e  ects that anesthesia has on neurovascular coupling. What has made progress in this area slow (relative to other imaging approaches like optical imaging) is the environment within the MRI scanner (high acoustic noise) - as well as the intolerance of head and body motion. This work adds to a relatively small, but quickly growing literature on awake mouse fMRI. The findings in the study include testing of an implanted head-coil (for MRI data reception). Two designs are described and the SNR of these units at 9.4T and 14T are reported. Further, responses to visual as well as whisker stimulation recorded in acclimated awake mice are shown. The most interesting finding, and most novel, is the observation that mice seem to learn to anticipate the presentation of the stimulus - as demonstrated by activations evident ~6 seconds prior to the presentation of the stimulus when stimuli are delivered at regular intervals (but not when stimuli are presented at random intervals). These kinds of studies are very challenging to do. The surgical preparation and length of time invested into training animals are grueling. I also see this work as a step in the right direction and evidence of the foundations for lots of interesting future work. However, I also found a few shortcomings listed below.

      Weaknesses:

      (1) The surface coil, although o ering a great SNR boost at the surface, ultimately comes at a cost of lower SNR in deeper more removed brain regions in comparison to commercially available Bruker coils (at room temperature). This should be quantified. A rough comparison in SNR is drawn between the implanted coils and the Bruker Cryoprobe - this should be a quantitative comparison (if possible) - including any di erences in SNR in deeper brain structures. There are drawbacks to the Cryoprobe, which can be discussed, but a more thorough comparison between the implanted coils, and other existing options should be provided (the Cryoprobe has been used previously in awake mouse experiments(Sensory evoked fMRI paradigms in awake mice - Chen, Physiological e ects of a habituation procedure for functional MRI in awake mice using a cryogenic radiofrequency probe – Yoshida, PREVIOUS REFERENCE). Further, the details of how to build the implanted coils should be provided (shared) - this should include a parts list as well as detailed instructions on how to build the units. Also, how expensive are they? And can they be reused?

      Thank you for the comment. We did not use a Bruker Cryoprobe for this work but rather a Bruker 4array surface coil. We are unable to compare to a cryoprobe since we do not have access to one for our system. A comparison to previously published data using different scanners could be possible but would require the sequence contain identical parameters to avoid introducing an uncontrollable variable, we are planning to recruit different laboratories to test the implanted RF coils with their existing cryoprobes in the future study. 

      We have included an updated figure comparing SNR at different depths across the Bruker 4-array coil and the implanted RF coils. As shown in Supplementary Figure 7B, there is significant SNR enhancement up to 4 mm cortical depth for both single loop and Figure 8 implanted RF coils in comparison to the Bruker 4-array coil.

      Author response image 1.

      Comparison between implanted and commercial coils. A shows representative coils in the single loop (left) and figure 8 styles (right). Supplementary Table 1 provides a parts list and cost for making these coils and Supplementary Figure 1 provides a circuit diagram to assemble. B presents the SNR line profile values as a function of distance from Pia Matter for each coil tested at 9.4T: commercial phased array surface coil (4 Array), implanted single loop, and implanted figure 8. SNR values were calculated by dividing the signal by the standard deviation of the noise. C-E shows a representative FLASH image with line profile of SNR measurements from each of the coils used to create the graph seen in B. Clear visual improvement in SNR can be seen in figures C-E. C – Commercial phased array. D – Single loop at 9.4T. E – Figure 8 at 9.4T. (N4 array = 6, Nsingle loop = 5, Nfigure 8 = 5)

      Additionally, we have added a supplementary figure (supp fig 1) of a circuit diagram, in an effort to disseminate the prototype design of the coils to other laboratories. We have included a detailed parts list with the cost for construction of the coils configured for our scanner(supp table 1). These specifics though would need to be adjusted to the precise field strength/bore size/animal the coil was being built for. As for reusability, the copper wire is cemented to the animal skull and this implantable coil should be considered as consumables for the awake mouse experiments, though the PCB parts can be retrieved.  

      (2) In the introduction, the authors state that "Awake mouse fMRI has been well investigated". I disagree with this statement and others in the manuscript that gives the reader the impression that awake experiments are not a challenging and unresolved approach to fMRI experiments in mice (or rodents). Although there are multiple labs (maybe 15 worldwide) that have conducted awake mouse experiments (with varying degrees of success/thoroughness), we are far from a standardized approach. This is a strength of the current work and should be highlighted as such. I encourage the authors to read the recent systematic review that was published on this topic in Cerebral Cortex by Mandino et al. There are several elements in there that should influence the tone of this piece including awake mouse implementations with the Bruker Cryoprobe, prevalence of surgical preparations, and evaluations of stress.

      Thank you for the comment. We agree with the reviewer that the current stage of awake mouse fMRI studies remains to be improved.  And, we have revised the Introduction to highlight the state-of-theart of awake mouse fMRI (Page 4 – line 81-88). 

      (3) The authors also comment on implanted coils reducing animal stress - I don't know where this comment is coming from, as this has not been reported in the literature (to my knowledge) and the authors don't appear to have evaluated stress in their mice. 

      Since question 3 and 4 are highly related to the acclimation procedures, we will answer the two questions together.   

      (4) Following on the above point, measures of motion, stress, and more details on the acclimation procedure that was implemented in this study should be included.

      We thank the reviewer to raise the animal training issues.  

      During the animal training, we have measured both pupil dynamic and eye motion features from training sessions, of which the detailed procedure is described in Methods (page 7-9 – line 172236). 

      The training procedure is carried out over a total of 5 weeks with four phases of training: i. Holding animal in hands, ii. Head-fixation and pupillometry, iii. Head-fixation and pupillometry with mockMRI acoustic exposure, iv. Head-fixation and pupillometry with Echo-Planar-Imaging (EPI) in the MR scanner.

      Author response table 1.

      As shown in Supp Fig 2B, the spectral power of pupil dynamics (<0.02Hz) and eye movements gradually increased as a function of the training time for head-fixed mice exposed to the mock MRI acoustic environment during phase 3.  In phase 4, when head-fixed mice were put into the scanner for the first time, both eye movements and pupil dynamics were initially reduced during scanning but recovered to an acclimated state on Day 2, similar to the level on Day 8 of phase 3.  These behavioral outputs would provide an alternative way to monitor the stress levels of the mice. 

      Author response image 2.

      The eye movements (A) and power spectra of pupil dynamics (<0.02Hz) (B) change during different training phases.

      It should be noted that stress may be related to increased frequency of eye blinking or twitching movements in human subjects(1–3). Whereas, the eyeblink of head-fixed mice has been used for behavioral conditioning to investigate motor learning in normal behaving mice(4–6). Importantly, head-fixed mouse studies have shown that eye movements are significantly reduced compared to the free-moving mice(7). The increased eye movement during acclimation process would indicate an alleviated stress level of the head-fixed mice in our cases. Meanwhile, stress-related pupillary dilation could dominate the pupil dynamics at the early phase of training(8). We have observed a gradually increased pupil dynamic power spectrum at the ultra-slow frequency during phase 3, presenting the alleviated stress-related pupil dilation but recovered pupil dynamics to other factors, including arousal, locomotion, startles, etc. in normal behaving mice.  Despite the extensive training procedure of the present work in comparison to the existing awake mouse fMRI studies (training strategies for awake mice fMRI have been reviewed by Mandino et al. to show the overall training duration of existing studies(9)), the stress remains a confounding factor for the brain functional mapping in head-fixed mice. In particular, a recent study(10) shows that the corticosterone concentration in the blood samples of head-fixed mice is significantly reduced on Day 25 following the training but remains higher than in the control mice. In the discussion section, we have discussed the potential issues of stress-related confounding factors for awake mouse fMRI studies (Page 16 – lines 436-458). 

      (1) A. Marcos-Ramiro, D. Pizarro-Perez, M. Marron-Romera, D. Gatica-Perez, Automatic blinking detection towards stress discovery. ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 307–310 (2014). https://doi.org/10.1145/2663204.2663239/SUPPL_FILE/ICMI1520.MP4.

      (2) M. Haak, S. Bos, S. Panic, L. Rothkrantz, DETECTING STRESS USING EYE BLINKS AND BRAIN ACTIVITY FROM EEG SIGNALS. Lance 21, 76 (2009).

      (3) E. Del Carretto Di Ponti E Sessam, Exploring the impact of Stress and Cognitive Workload on Eye Movements: A Preliminary Study. (2023).

      (4) S. A. Heiney, M. P. Wohl, S. N. Chettih, L. I. Ru olo, J. F. Medina, Cerebellar-dependent expression of motor learning during eyeblink conditioning in head-fixed mice. J Neurosci 34, 14845–14853 (2014).

      (5) S. N. Chettih, S. D. Mcdougle, L. I. Ruffolo, J. F. Medina, Adaptive timing of motor output in the mouse: The role of movement oscillations in eyelid conditioning. Front Integr Neurosci 5, 12996 (2011).

      (6) J. J. Siegel, et al., Trace Eyeblink Conditioning in Mice Is Dependent upon the Dorsal Medial Prefrontal Cortex, Cerebellum, and Amygdala: Behavioral Characterization and Functional Circuitry. eNeuro 2, 51–65 (2015).

      (7) A. F. Meyer, J. O’Keefe, J. Poort, Two Distinct Types of Eye-Head Coupling in Freely Moving Mice. Current Biology 30, 2116-2130.e6 (2020).

      (8) H. Zeng, Y. Jiang, S. Beer-Hammer, X. Yu, Awake Mouse fMRI and Pupillary Recordings in the UltraHigh Magnetic Field. Front Neurosci 16, 886709 (2022).

      (9) F. Mandino, S. Vujic, J. Grandjean, E. M. R. Lake, Where do we stand on fMRI in awake mice? Cereb Cortex 34 (2024).

      (10) K. Juczewski, J. A. Koussa, A. J. Kesner, J. O. Lee, D. M. Lovinger, Stress and behavioral correlates in the head-fixed method: stress measurements, habituation dynamics, locomotion, and motor-skill learning in mice. Scientific Reports 2020 10:1 10, 1–19 (2020).

      (5) It wasn't clear to me at what times the loop versus "Figure 8" coil was being used, nor how many mice (or how much data) were included in each experiment/plot. There is also no mention of biological sex.

      Thank you for the comment. We have clarified sex and number. The figure 8 coil was only used as part of development to show the improvement of the coil design for cortical measurements. The detailed information is described in Method (Page 6 – line 127-129 & Page 10 – line 269-270). Additionally animal numbers have been included in the figure captions.

      (6) Building on the points above, the manuscript overall lacks experimental detail (especially since the format has the results prior to the methods).

      Thank you for the comment. We have modified the manuscript to increase the experimental detail and moved the methods section before the results.

      (7) An observation is made in the manuscript that there is an appreciable amount of negative BOLD signal. The authors speculate that this may come from astrocyte-mediated BOLD during brain state changes (and cite anesthetized rat and non-human primate experiments). This is very strange to me. First, the negative BOLD signal is not plotted (please do this), further, there are studies in awake mice that measure astrocyte activation eliciting positive BOLD responses (see Takata et al. in Glia, 2017).

      We thank the reviewer to raise the negative BOLD fMRI observation issue.  We added a subplot of the negative BOLD signal changes in the revised Figure 4. This negative BOLD signals across cortical areas could be coupled with brain state changes upon air-pu -induced startle responses. Our future studies are focusing on elucidating the brain-wide activity changes of awake mice with fMRI.  We also provide a detailed discussion of the potential mechanism underlying the negative BOLD fMRI signals. First, as reported in the paper (suggested  by the reviewer),  astrocytic Ca2+ transients coincide with positive BOLD responses in the activated cortical areas, which is aligning with the neurovascular coupling (NVC) mechanism. However, there is emerging evidence to show that astrocytic Ca2+ transients are coupled with both positive and negative BOLD responses in anesthetized rats(11) and awake mice(12). An intriguing observation is that cortex-wide negative BOLD signals coupled with the spontaneous astrocytic Ca2+ transients could co-exist with the positive BOLD signal detected at the activated cortex.  Studies have shown that astrocytes are involved in regulating brain state changes(13), in particular, during locomotion(14) and startle responses(15). These brain state-dependent global negative BOLD responses are also related to the arousal changes of both non-human primates(16) and human subjects(17).  The established awake mouse fMRI platform with ultra-high spatial resolution will enable the brain-wide activity mapping of the functional nuclei contributing to the brain state changes of head-fixed awake mice in future studies. (Page 17-18 – Line 478-490)

      (11) M. Wang, Y. He, T. J. Sejnowski, X. Yu, Brain-state dependent astrocytic Ca2+ signals are coupled to both positive and negative BOLD-fMRI signals. Proc Natl Acad Sci U S A 115, E1647–E1656 (2018).

      (12) C. Tong, Y. Zou, Y. Xia, W. Li, Z. Liang, Astrocytic calcium signal bidirectionally regulated BOLD-fMRI signals in awake mice in Proc. Intl. Soc. Mag. Reson. Med. 32, (2024).

      (13) K. E. Poskanzer, R. Yuste, Astrocytes regulate cortical state switching in vivo. Proc Natl Acad Sci U S A 113, E2675–E2684 (2016).

      (14) M. Paukert, et al., Norepinephrine controls astroglial responsiveness to local circuit activity. Neuron 82, 1263–1270 (2014).

      (15) R. Srinivasan, et al., Ca2+ signaling in astrocytes from IP3R2−/− mice in brain slices and during startle responses in vivo. Nat Neurosci 18, 708 (2015).

      (16) C. Chang, et al., Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113, 4518– 4523 (2016).

      (17) B. Setzer, et al., A temporal sequence of thalamic activity unfolds at transitions in behavioral arousal state. Nat Commun 13 (2022).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this work. The maps shown are among the best-quality maps out there. Here are suggestions to the authors.

      (1) Both the ACA and VRA are rather unexpected. The authors explain these briefly as being part of the associative cortical areas. Both the ACA and VRA are not canonical associative areas (or at least not to us). This warrants a stronger discussion.

      To verify both ACA and VRA as associate areas, we provide the  connectivity map projections from the Allen Brain Atlas (seen below). These projections are derived from a Cre-dependent AAV tracing of axonal projections. We have included an explanation of this in the introduction. 

      Author response image 3.

      Representative images are shown indicating connections between the barrel cortex and retrosplenial area from an injection in the barrel cortex (Left panel) as well as the visual cortex and cingulate connection from an injection in the visual cortex (Right panel). Images are of connectivity map projections from the Allen Brain Atlas derived from a Cre-dependent AAV tracing of axonal projections

      (2) This is a lot of work. But looking at the figures, this is not obvious. We read in the caption that several hundred trials were used. It would be good to also specify how many mice. It would be clearer to represent this info in the figure as well to support the fact that this is not a trivial acquisition.

      Thank the reviewer to raise the e ort issue. We have edited the figure to include this information and included the numbers in the text as well

      (3) The training protocol is seemingly extensive, but this is only visible by following another reference. Including a description in this work would help the reader make sense of the effort that went into this work.

      We thank the reviewer to raise the training protocol issue. We have more thoroughly discussed the training method used for this study (page 7-9 – line 172-236)

      (4) I really would love to see that dataset made freely available - this should be the norm.

      The datasets have been uploaded to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts: 

      (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      (page 21 – line 573-579)

      Reviewer #2 (Recommendations For The Authors):

      (1) I'm a little confused about the stimulation paradigm and the effect of it causing an effective 2second TR (which is on the long side) - please elaborate (a figure might be helpful). The paradigm for visual stimulation also seems elaborate, can you please explain the logic and how it was developed?

      Thank you for raising the detailed stimulation paradigm issues. The stimulation paradigm is independent and does not interfere with the setup of the effective 2-second TR. The 2-second TR is based on the usage of 2-segment EPI, each with a TR of 1-second. The application of 2-segment paradigm enables the echo spacing with 0.52 ms with effective image bandwidth with 3858Hz, assuring less image distortion.  The stimulation paradigm was defined by an “8s on, 32s o ” epoch such to elicit a strong BOLD response and could be used for any reasonable TR duration. 

      We have included a figure outlining the stimulation paradigm (Supp Fig. 3)

      (2) I had difficulties viewing the movies (on my MAC).

      Thank you for this note. We have re-upload the videos in .mov format

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH.

      Our conclusion regarding the necessity of CCK signaling for FSH secretion is based on the following evidence:

      (1) CCK-like receptors are expressed in the pituitary gland predominantly on FSH cells.

      (2) Application of CCK to pituitaries elicits FSH cell activation and to a much lesser degree activation of LH cells.  (calcium imaging assays)

      (3) Application of CCK to pituitaries and by injections in-vivo significantly increased only FSH release.

      (4) Mutating the FSH-specific CCK receptor in a different species of fish (medaka) also causes a complete shutdown of FSH production and phenocopies a fsh-mutant phenotype (Uehara, Nishiike et al. 2023).

      Taken together, we believe that this data strongly supports the conclusion that CCK is necessary for FSH production and release from the fish pituitary. Admittedly, the overlapping effects of CCK on both FSH and LH cells in zebrafish (evident in both our calcium imaging experiments and especially in the KO phenotype) complicates the interpretation of the phenotype. We speculate that the effect of CCK on LH cells in zebrafish can be caused either by paracrine signaling within the gland or by the effects of CCK on GnRH neurons that were shown to express CCK receptors .

      In the current version, we emphasize that CCK also induces LH secretion. Although it does not affect LH to the same extent as FSH, an overlap does exist. This is mentioned in the abstract and discussion.

      Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types.

      Although there is evidence for the expression of CCK receptor in other tissues, we do show a direct decrease of FSH and LH expression in the gonadotrophs of the pituitary of the mutant fish; taken together with its significant expression in FSH cells compared to the rest of the cells of the pituitary in the cell specific transcriptomic, it is the most reasonable explanation for the mutant phenotype.

      Unfortunately, unlike in mice, technologies for conditional knockout of genes in specific cell types are not yet available for our model and cell types. Additional tissue distribution of the three receptors types of CCK was added in supplementary figure 1, from this tissue distribution it can be appreciated how in the pituitary only CCKBRA (our identified CCK receptor) is expressed, while in other tissues it is either not expressed or expressed with the additional CCK receptors that can compensate its activity.

      It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      While the observed gonadal phenotype of the KO (sex inversed fish) should have a developmental origin since it requires a long time to manifest, the effect of the KO on FSH and LH cells is probably more acute. Unfortunately a specific antagonist that affect only CCKRBA and not the other CCK receptors wasn’t identified yet.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy Discussion of the link is speculative and not fully merited.

      In the revised manuscript, we provided data to link cck with metabolic status in supplementary figure 1 and modified the discussion to tone down the link between metabolic status to and reproductive state.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level).

      Innervation of the fish pituitary does not imply a synaptic-like connection between axon terminals and endocrine cells. In fact, such connections are extremely rare, and their functionality is unclear. Instead, the mode of regulation between hypothalamic terminals and endocrine cells in the fish pituitary is more similar to "volume transmission" in the CNS, i.e. peptides are released into the tissue and carried to their endocrine cell targets by the circulation or via diffusion. A short explanation was added in lines 395-398 in the discussion

      Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown.

      Our revised manuscript  includes detailed experiments showing the activation of the receptor by its homologous ligand, supplementary Figure 1 includes a transactivation  assay of CCK to its receptor and the effect of the different mutants on the activation of the receptor. Unfortunately, no antibody is available against this fish specific receptor (one of the caveats of working with fish models); therefore, we cannot present receptor protein data.

      The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells.

      We agree with the reviewer that there are some disadvantages in choosing to work with a whole-tissue preparation. However, we believe that the advantages of working in a more physiological context far outweigh the drawbacks as it reflects the natural dynamics more precisely. Since our transcriptome data, as well as our ISH staining, show that the CCK receptor is exclusively expressed in FSH cells, it is improbable that the observed calcium response is mediated via a different pituitary cell type.

      Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3).

      The difference between the synchronization levels of LH and FSH cells activity stems from the gap-junction mediated coupling between LH cells that does not exist between FSH cells(Golan, Martin et al. 2016). Therefore, the onset of calcium response in FSH cells is dependent on the irregular diffusion rate of the peptide within the preparation, whereas the tight homotypic coupling between LH cells generates a strong and synchronized calcium rise that propagates quickly throughout the entire population

      The differences in connectivity between LH and FSH cells is mentioned in lines 194-195

      Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

      We agree with the reviewer that, for now, we are unable to determine whether hypothalamic or peripheral CCK are the main drivers of FSH cells. While the strong innervation of the gland by CCK-secreting hypothalamic neurons strengthens the notion of a hypothalamic-releasing hormone and also fits with the dogma of the neural control of the pituitary gland in fish (Ball 1981), more experiments are required to resolve this question.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH-containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate. The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary: 

      The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      This is a very important comment, also raised by reviewer 1. To avoid repetition, please see our detailed response to the comment above.

      The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      As detailed in the responses to the first reviewer, we cannot conduct conditional, cellspecific gene knockout in our model. However we did conducted KO and show the direct effect on FSH and LH secretion together with physiological characterisation of the mutant.

      Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      In the revised manuscript, we clarified which of the receptors are expressed (CCKRBA) and which receptor is targeted. We also provided data showing the specificity of the receptors (both WT and mutant) to the ligands. Supplementary 1 shows receptor cross-activation. The method also specifies the exact NCBI ID numbers of the targeted receptor and the antibody used for the immunostaining.

      Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      We agree with the reviewer that the overlap in the effect of CCK measured in the calcium activation of cells and in the KO model does not allow us to conclude selectivity. In this context, it is crucial to highlight that CCKRBA exhibits high expression on FSH cells but not on LH cells. Therefore, the effect of CCK on LH cells is likely paracrine or through GnRH neurons that were shown to express CCK receptors. In the current version, we emphasize that CCK also induces LH secretion. Although it does not affect LH to the same extent as FSH, an overlap does exist. This is mentioned in the abstract and discussion.

      The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

      Figures colours had been updated to better visualise the specific staining of the antibody. Also, The same antibody had been previously used to mark CCK-positive cells in the gut of the red drum fish(Webb, Khan et al. 2010) , where a control (pre-absorbed with the peptide) experiment had been conducted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      The authors have not yet established that CCK is the primary regulator of FSH in vivo.

      In the new version, we highlight the leading effect of CCK on the reproductive axis, which includes FSH and LH.

      Introduction:

      The authors need to make clear earlier in the Introduction that fish have two types of gonadotropes. This information comes too late (last paragraph) currently.

      Added in line 42

      They should discuss relevant data on the differential regulation of FSH and LH in fish, as a rationale for looking for different releasing factors.

      This has been discussed in the first paragraph of the introduction

      In the last sentence of the penultimate paragraph, the authors assume that it must be a hypothalamic factor that regulates FSH. Why is this necessarily the case? Are there data indicating that a hypothalamic factor is required for FSH production in fish?

      This has been mentioned in the discussion, we do not deny that circulating CCK or CCK from other brain areas might affect FSH secretion in the pituitary (line 402-404). However, as the hypothalamus serves as the main gateway from the brain to the pituitary and contains hypophysiotropic CCK neurons it is the most reasonable assumption.

      Results:

      In the first paragraph, the authors reference three types of CCK receptors, only one of which is expressed in the pituitary. The specific receptor should be named here.

      The receptor name and NCBI id had been added in this paragraph.

      Figure 1: What specificity controls were used for the ISH in Figure 1?

      HCR- The method used to identify RNA expression and developed by Molecular Instruments (https://www.molecularinstruments.com/hcr-rnafish-protocols), do not require specific control as had been previously done with older ISH methods. The use of multiple short probes assure the specificity to the RNA.More over the expression is specific to the targeted cells.

      In Figure 1D, the red square is missing in the KO fish (at low magnification).

      This was fixed in the updated version.

      In Figure 1G, the number of dots does not correspond to the number of animals described in the figure legend. Does each point represent an animal?

      Each dot represent a fish. The order of the numbers in the legend didn’t match the order in the graph, this had been fixed in the last version

      Figure 2A: It is not clear that all FSH (GFP) cells are double-labeled. Should all double-labeled cells appear white? Many appear as green. Some quantification of the proportion of co-labeling is needed. Also, the scale bars are too small to read. Perhaps add the size of the scale bars to the legend.

      They are all double-labeled, as can be seen by the single-color images, since GFP fluorescence is stronger than RCaMP fluorescence, the double-labelling might be seen a green cells; a scale bar was added.

      Figure 2C: Is the synchronous activity of LH cells here dependent on endogenous GnRH? Can these events be blocked with a GnRH receptor antagonist?

      We currently do not have enough data to support this hypothesis and the in vivo 2 photon system is not optimal to answer these questions since these are spontaneous events which are difficult to predict. This is the main reason we moved to an ex vivo system. The similar response we receive when applying GnRH in the ex vivo system support it is GnRH activation.

      Figure 4C: As some LH cells respond to CCK, can the authors really claim that CCK is a selective regulator of FSH? What explains the heterogeneity in the response of LH cells to CCK?

      In this version, we highlight that CCK directly activates FSH but it is also affecting LH to some extent. However it is clear that the effect on FSH cells is more significant.

      Figures 5A and B: With larger Ns, some of the trends might be significant (e.g., GnRH stimulated FSH release and CCK stimulated LH release).

      Though there is a trend, the values in the Y axis reveal that the trend of response of FSH to GnRH and LH to CCK is lower then the distribution of the basal response (the before) in all of the graphs. Hence we do not believe a larger N will affect those results. We added the range of the secreted hormones concentrations in the result description to emphasize the difference in values,

      Figures 5C and D: What explains the lack of an increase in LH secretion following GnRH treatment?

      We did not measure LH Secretion in the plasma as we didn’t have enough blood, we do see an increase in LH transcription (see supplementary figure 5 – figure supplement 1)

      Also, as mRNA levels were measured (in C), reference should be made to expression rather than transcription. Not all changes in mRNA levels reflect changes in transcription.Also, remove transcription from the legend. Reference to supplementary Figure 4 in the legend should be supplementary Figure 6. Finally, in C and D, distinguish males from females (as in 5A and B).

      Modifications had been done according to the reviewer suggestions.

      Figure legends:

      The figure legends are very long. One way to shorten them is to remove descriptions of the results. The legends should indicate what is in each figure, not the results of the experiments.

      Modifications had been done according to the reviewer suggestions.

      Sample sizes should be spelled out in the legends, as they are not in the M&M.

      We made sure all sample sizes are mentioned in the legend

      Materials and Methods:

      Section 1.1 can be removed as it repeats content presented elsewhere.

      This section was removed

      Section 1.5: It is unclear what this means: "blinding was not applied to ensure tractability" Please clarify.

      This section was removed

      Reviewer #2 (Recommendations For The Authors):

      It appears that zebrafish have two ligands: ccka, cckb. Also multiple receptors: cckar, cckbra and cckbrb. Authors need to discuss this and clearly state which ligand and which receptor they are referring to in the manuscript.

      We discussed the receptor type in the first paragraph of the results, the exact synthetic peptide used is described in the methods. The 8 amino acids of the mature CCK peptide are the same between CCKa and CCKb. A sentence regarding the specificity of the antibody to the mature CCK peptide was added in line 101.

      "to GnRH puff application (300 μl of 30 μg/μl)"; (250 μl of 30 μg/ml CCK)

      Please give the final concentration to make it easy on the readers of the data.

      The molarity of the final concentration was added.

      (2.4) Differential calcium response underlies differential hormone. This section is a bit confusing to read, for example:

      "For that, we collected the medium perfused through our ex vivo system (Fig. 2a) and measured LH and FSH levels using a specific ELISA validated for zebrafish [31] while monitoring the calcium activity of the cells."

      So the authors did the ELISA while monitoring the activity (?). This sentence does not make sense: please rewrite it.

      We modified this sentence  in line 308-311

      To functionally validate the importance of CCK signalling we used CRISPR-cas9 to generate loss-of-function (LOF) mutations in the pituitary- CCK receptor gene.

      The authors need to clearly state WHICH gene they inactivated: Zebrafish have three CCK-receptors, so "the pituitary receptor gene" needs to be defined.

      Was added again in line 107, and is mentioned in the methods

      Figure 3 is a crucial figure!

      Figure 3B: The data are not very convincing. Please state how thick the sections are in the figure legend (assuming these are adult pituitaries),

      Added in the legend (figure 1C in the new version), slice thickness and adult fish.

      Please show at least the merged image a high magnification view of the co-localization of the receptor with the cells.

      This is figure 1 in the new revision, a magnified figure was added

      Please give the scale bar size for 3B.

      Scales for all images were added

      Figure 3C: the co-localization of the terminals of the CCK and FSH cells shows very few cells expressing close to terminals.

      Important: Because the labelling of the terminals with anti-CCK looks a lot like the background, it is very important to show the control (anti-CCK antibody pre-absorbed with the peptide). The authors should have these data. The photo needs to have been taken at the same gain (contrast) and the photo showing the terminals.

      This is  a commercial antibody that had been previously validated for CCK in fish. The co-localization pattern resembles GnRH innervation in the pituitary. In fish when hypothalamic neurons innervate the pituitary they do not innervate all the cells, as this is an endocrine system, the peptide can travel to neighbouring cells via diffusion or aided blood flow (Golan, Zelinger et al. 2015) ).  The images reveal the direct innervation of CCK in the pituitary and its proximity to FSH cells.

      Figure 4c, on right. The text seems to be stretched as if the photo was adjusted without locking the aspect ratio. Please check the original images.

      This has been fixed

      Can the authors use different pseudo colours? Differentiating a double label of white versus yellow is very difficult, and thus the photo is not very convincing.

      This had been changed to green and magenta

      What is meant by "CCK-AB" antibody? Perhaps anti-CCK would be a better label

      This has been fixed

      Figure 5A: increase the magnification of the insets; the structure of the gonads is very difficult to see with clarity in these low mag images. The most obvious way to improve this figure is to reduce or eliminate the pie graph (not really necessary) and show a high magnification (and larger) image of the gonadal structure.

      This is figure 1 in the new version, with magnification of the gonad next to each body section.

      Discussion:

      " Moreover, in the zebrafish, as well as in other species, the functional overlap in gonadotropin signalling pathways is not limited to the pituitary but is also present in the gonad, through the promiscuity of the two gonadotropin receptors"<br /> The reasoning of this sentence is not clear: zebrafish do not use GnRH to control reproduction: they lack GnRH1 through genomic rearrangement (see Whitlock, Postlethwait and Ewer 2019) and KO of GnRH2/GnRH3 does not affect reproduction.

      While GnRH KO model indicate a redundancy of GnRH in this axis in zebrafish, there is also ample evidence for its importance in regulating reproduction such as its effect on gonadotropin (Golan, Martin et al. 2016) and its use in spawning inductions in fish (Mizrahi and Levavi-Sivan 2023). We believe it is currently too soon to conclude that GnRH signalling is completely non relevant to reproduction in cyprinids.  

      Reviewing Editor (Recommendations For The Authors):

      It would be interesting to see calcium imaging experiments in the CCKR receptor mutants to establish a more direct connection between peptide action and activity.

      We added a receptor assay that reflect the non-activation of the mutated receptors by CCK (supplementary figure 1) , and compared it to the wild type that is activated. This show that: 1) CCK directly activate our identified receptor in FSH cells. 2) the mutated receptors are non-active.

      "all homozygous fish (CCKR+12/+7/-1/ CCKR+12/+7/-1, n=12)"

      It may be better to write the genotype of fish separately as CCKR+12/+12, CCKR+7/+7 and CCKR-1/-1, n=12) otherwise it seems as if all alleles occurred together in the same fish.

      Modified according to the reviewer request

      In Figure 1 scale bar legends are very small. 

      Description of the scale bars were added to the all the legends

      Figure 1 legend "On the top right of each panel is the gender distribution" - fish have no gender but sex.

      Modified according to the reviewer request

      The authors should endeavour to improve the presentation of the figures. They should use a sans-serif font and check that text is not cut at the edge of figure panels, that scale bars are uniform and clearly labelled and fonts are of similar size and clearly legible. E.g. labels of the fish brain of Fig3A are very small.

      We modified all the figures to adapt the font and the scales, we increased the size of the image in Figure 3a to make the labels clearer.

      Please use the elife format to name supplementary figures, as Figure X - Figure Supplement Y (each supplement associated with one of the main figures).

      Fixed

      Peptide concentrations in the ex vivo experiments should also be given as molar concentrations not only as '250 μl of 30 μg/ml CCK'.

      Fixed

      "In contrast, FSH cells responded with a very low calcium rise in hormonal secretion in response to GnRH" - a very low rise in hormonal secretion

      Fixed

      Please clarify why you used a GnRH synthetic agonist and not the native peptide.

      It is commonly used for spawning induction in fish (line 245); it has also been shown to directly affect the secretion of LH and FSH (Biran, Golan et al. 2014, Biran, Golan et al. 2014, Mizrahi, Gilon et al. 2019) , added to line 245.

      References

      Ball, J. (1981). "Hypothalamic control of the pars distalis in fishes, amphibians, and reptiles." General and comparative endocrinology 44(2): 135-170.

      Biran, J., M. Golan, N. Mizrahi, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2014). "Direct regulation of gonadotropin release by neurokinin B in tilapia (Oreochromis niloticus)." Endocrinology 155(12): 4831-4842.

      Biran, J., M. Golan, N. Mizrahi, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2014). "LPXRFa, the Piscine Ortholog of GnIH, and LPXRF Receptor Positively Regulate Gonadotropin Secretion in Tilapia (Oreochromis niloticus)." Endocrinology 155(11): 4391-4401.

      Golan, M., A. O. Martin, P. Mollard and B. Levavi-Sivan (2016). "Anatomical and functional gonadotrope networks in the teleost pituitary." Scientific Reports 6: 23777.

      Golan, M., E. Zelinger, Y. Zohar and B. Levavi-Sivan (2015). "Architecture of GnRH-Gonadotrope-Vasculature Reveals a Dual Mode of Gonadotropin Regulation in Fish." Endocrinology 156(11): 4163-4173.

      Mizrahi, N., C. Gilon, I. Atre, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2019). "Deciphering Direct and Indirect Effects of Neurokinin B and GnRH in the Brain-Pituitary Axis of Tilapia." Front Endocrinol (Lausanne) 10: 469.

      Mizrahi, N. and B. Levavi-Sivan (2023). "A novel agent for induced spawning using a combination of GnRH analog and an FDA-approved dopamine receptor antagonist." Aquaculture 565: 739095.

      Uehara, S. K., Y. Nishiike, K. Maeda, T. Karigo, S. Kuraku, K. Okubo and S. Kanda (2023). "Cholecystokinin is the follicle-stimulating hormone (FSH)-releasing hormone." bioRxiv: 2023.2005.2026.542428.

      Webb, K. A., Jr., I. A. Khan, B. S. Nunez, I. Rønnestad and G. J. Holt (2010). "Cholecystokinin: molecular cloning and immunohistochemical localization in the gastrointestinal tract of larval red drum, Sciaenops ocellatus (L.)." Gen Comp Endocrinol 166(1): 152-159.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      The authors introduce a computational model that simulates the dendrites of developing neurons in a 2D plane, subject to constraints inspired by known biological mechanisms such as diffusing trophic factors, trafficked resources, and an activity-dependent pruning rule. The resulting arbors are analyzed in terms of their structure, dynamics, and responses to certain manipulations. The authors conclude that 1) their model recapitulates a stereotyped timecourse of neuronal development: outgrowth, overshoot, and pruning 2) Neurons achieve near-optimal wiring lengths, and Such models can be useful to test proposed biological mechanisms- for example, to ask whether a given set of growth rules can explain a given observed phenomenon - as developmental neuroscientists are working to understand the factors that give rise to the intricate structures and functions of the many cell types of our nervous system. 

      Overall, my reaction to this work is that this is just one instantiation of many models that the author could have built, given their stated goals. Would other models behave similarly? This question is not well explored, and as a result, claims about interpreting these models and using them to make experimental predictions should be taken warily. I give more detailed and specific comments below.  

      We thank the reviewer for the summary of the work. But the criticism “that this is one instantiation of many models [we] could have built” is unfair as it can apply to any model. We chose one of the most minimalistic models which implements known biological mechanisms including activity-independent and -dependent phases of dendritic growth, and constrained parameters based on experimental data. We compare the proposed model to other alternatives in the Discussion section. In the revised manuscript, we additionally investigate the sensitivity of model output to variations of specific parameters, as explained below.

      Point 1.1. Line 109. After reading the rest of the manuscript, I worry about the conclusion voiced here, which implies that the model will extrapolate well to manipulations of all the model components. How were the values of model parameters selected? The text implies that these were selected to be biologically plausible, but many seem far off. The density of potential synapses, for example, seems very low in the simulations compared to the density of axons/boutons in the cortex; what constitutes a potential synapse? The perfect correlations between synapses in the activity groups is flawed, even for synapses belonging to the same presynaptic cell. The density of postsynaptic cells is also orders of magnitude of, etc. Ideally, every claim made about the model's output should be supported by a parameter sensitivity study. The authors performed few explorations of parameter sensitivity and many of the choices made seem ad hoc.  

      We have performed detailed sensitivity analysis on the model parameters mentioned by the reviewer, including (I) the density of postsynaptic cells (somatas), (II) the density of potential synapses, and (III) the level of correlations between synapses. 

      (I) While the density of postsynaptic cells in our baseline model seems a bit low, at least when compared to densities observed in adulthood (Keller et al., 2018), we explored how altering this value affects the model dynamics. We found that the postsynaptic cell density does not affect the timing of dendritic outgrowth, overshoot and synaptic pruning. It only changes the final size of the dendritic arbor and the resulting number of connected synapses. This analysis is now included in Supplementary Figure 3-2.

      (II) The density of potential synapses and the density of connected synapses that we used in the manuscript are already in the range of densities that can be found in the literature (Leighton et al., 2024; Ultanir et al., 2007; Glynn et al., 2011; Yang et al., 2014), some of which we already cited in the original submission.

      A potential concern might be that the rapid slowing down of growth in the model could be due to a depletion of potential synapses. To illustrate that this is not the case, we showed that the number of available potential synapses over the time course of the simulations remains high (Figure 3, new panel e). Therefore, the initial density of potential synapses is sufficient and does not affect the final density of connected synapses.

      To further illustrate the robustness of our model dynamics to longer simulation times, we added a new supplementary figure (Supplementary Figure 3-1).

      These new figure additions (Figure 3e, Supplementary Figure 3-1, and Supplementary Figure 3-2) and their implications for the model dynamics are discussed in the Results section of the revised paper:

      p.9 line 198, “After the initial overshoot and pruning, dendritic branches in the model stay stable, with mainly small subbranches continuing to be refined (Figure 3-Figure Supplement 1). This stability in the model is achieved despite the number of potential synaptic partners remaining high (Figure 3e), indicating a balance between activity-independent and activitydependent mechanisms. The dendritic growth and synaptic refinement dynamics are independent of the postsynaptic somata densities used in our simulations (Figure 3-Figure Supplement 2). Only the final arbor size and the number of connected synapses decrease with an increase in the density of the somata, while the timing of synaptic growth, overshoot and pruning remains the same (Figure 3-Figure Supplement 2).”

      We also added more details to the description of our model in the Methods section:

      p.24 line 615, “For all simulations in this study, we distributed nine postsynaptic somata at regular distances in a grid formation on a 2-dimensional 185 × 185 pixel area, representing a cortical sheet (where 1 pixel = 1 micron, Figure 4). This yields a density of around 300 neurons per 𝑚𝑚2 (translating to around 5,000 per 𝑚𝑚3, where for 25 neurons in Figure 3Figure Supplement 2 this would be around 750 neurons per 𝑚𝑚2 or 20,000 per 𝑚𝑚3). The explored densities are a bit lower than compared to neuron densities observed in adulthood (Keller et al., 2018). In the same grid, we randomly distributed 1,500 potential synapses, yielding an initial density of 0.044 potential synapses per 𝜇𝑚2 (Figure 3e). At the end of the simulation time, around 1,000 potential synapses remain, showing that the density of potential synapses is sufficient and does not significantly affect the final density of connected synapses. Thus, the rapid slowing down of growth in our model is not due to a depletion of potential synaptic partners. The resulting density of stably connected synapses is approximately 0.015 synapses per 𝜇𝑚2 (around 60 synapses stabilized per dendritic tree, Figure 3b). This density compares well to experimental findings, where, especially during early development, synaptic densities are described to be within a range similar to the one observed in our model (Leighton et al., 2024; Ultanir et al., 2007; Glynn et al., 2011; Yang et al., 2014; Koshimizu et al., 2009; Tyler and Pozzo-Miller, 2001).”

      (III) Lastly, we investigated how the correlation between synapses of the same activity group might affect our conclusions. As correlations in our model mainly arise from patterns of spontaneous activity which are abundant in early postnatal development (retinal waves (Ackman et al., 2012) or endogenous activity in the form of highly synchronized events involving a large fraction of the cells (Siegel et al., 2012), we explored varying the correlations within each activity group, across activity groups and combinations of both. While this analysis supported our previously described intuition on how competition between synaptic activities should drive activity-dependent refinement, recently a study found direct evidence for such subcellular refinement of synaptic inputs specifically dependent on spontaneous activity between retinal ganglion cell axons and retinal waves in the superior colliculus (Matsumoto et al., 2024). The new analysis confirmed our earlier results that the competition between activity groups leads to activity-dependent refinement and yielded further insight into how the studied activity correlations can affect the competition. Those results are presented in a completely new figure (new Figure 5, supported by the Supplementary Figure 5-1 and 5-2) and discussed in the Results section:

      p.11 line 249, “Group activity correlations shape synaptic overshoot and selectivity competition across synaptic groups.

      Since correlations between synapses emerge from correlated patterns of spontaneous activity abundant during postnatal development (Ackman et al., 2012; Siegel et al., 2012), we explored a wide range of within-group correlations in our model (Figure 5a). Although a change in correlations within the group has only a minor effect on the resulting dendritic lengths (Figure 5b) and overall dynamics, it can change the density of connected synapses and thus also affect the number of connected synapses to which each dendrite converges throughout the simulations (Figure 5c,e). This is due to the change in specific selectivity of each dendrite which is a result of the change in within-group correlations (Figure 5d). While it is easier for perfectly correlated activity groups to coexist within one dendrite (Figure 5-Figure Supplement 1a, 100%), decreasing within-group correlations increases the competition between groups, producing dendrites that are selective for one specific activity group (60%, Figure 5d, Figure 5-Figure Supplement 1a). This selectivity for a particular activity group is maximized at intermediate (approximately 60%) within-group correlations, while the contribution of the second most abundant group generally remains just above random chance levels (Figure 5-Figure Supplement 1a). Further reducing within-group correlations (20%, Figure 5a) causes dendrites to lose their selectivity for specific activity groups due to the increased noise in the activity patterns (20%, Figure 5a). Overall, reducing within-group correlations increases synapse pruning (Figure 5f, bottom), also found experimentally (Matsumoto et al., 2024) as dendrites require an extended period to fine-tune connections aligned with their selectivity biases. This phenomenon accounts for the observed reduction in both the density and number of synapses connected to each dendrite.

      In addition to the within-group correlations, developmental spontaneous activity patterns can also change correlations between groups as for example retinal waves propagated in different domains (Feller et al., 1997) (Figure 5-Figure Supplement 2). An increase in between-group correlations in our model intuitively decreases competition between the groups since fully correlated global events synchronize the activity of all groups (Figure 5-Figure Supplement 2). The reduction in competition reduces pruning in the model, which can be recovered by combining cross-group correlations with decreased within-group correlations (Figure 5-Figure Supplement 2). Our simulations show that altering the correlations within activity groups increases competition (by lowering the within-group correlations) or decreases competition (by raising the across-group correlations). Hence, in our model, competition between activity groups due to non-trivially structured correlations is necessary to generate realistic dynamics between activity-independent growth and activity-dependent refinement or pruning.

      In sum, our simulations demonstrate that our model can operate under various correlations in the spike trains. We find that the level of competition between synaptic groups is crucial for the activity-dependent mechanisms to either potentiate or depress synapses and is fully consistent with recent experimental evidence showing that the correlation between spontaneous activity in retinal ganglion cells axons and retinal waves in the superior colliculus governs branch addition vs. elimination (Matsumoto et al., 2024)."

      Precise details on the implementation of the changed activity correlations were added to the Methods section:

      p. 25 line 638, “Within-group and across-group activity correlations. For the decreased withingroup correlations, we generated parent spike trains for each individual group with the firing rate 𝑟𝑖𝑛 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ 𝑃𝑖𝑛 (e.g., 𝑃𝑖𝑛 = 100%; 60%; 20%, Figure 5). All the synapses of the same group share the same parent spike train and the remaining spikes for each synapse are uniquely generated with the firing rate 𝑟𝑟𝑒𝑠𝑡 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ (1 − 𝑃𝑖𝑛) (e.g., (1 − 𝑃𝑖𝑛) = 0%; 40%; 80%), resulting in the desired firing rate 𝑟𝑡𝑜𝑡𝑎𝑙 (see Table 1). For the increase in across-group correlations, we generated one master spike train with the firing rate 𝑟𝑐𝑟𝑜𝑠𝑠 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ 𝑃𝑐𝑟𝑜𝑠𝑠 for all the synapses of all groups (e.g., 𝑃𝑐𝑟𝑜𝑠𝑠 = 5%; 10%; 20%, Figure 5-Figure Supplement 2). This master spike train is shared across all groups and then filled up according to the within-group correlation (if not specified differently 𝑃𝑖𝑛 = 1 − 𝑃𝑐𝑟𝑜𝑠𝑠 to maintain the rate 𝑟𝑡𝑜𝑡𝑎𝑙). In all the cases, also in those where the change in across-group correlations is combined with the change in within-group correlations, the remaining spikes for each synapse are generated with a firing rate 𝑟𝑟𝑒𝑠𝑡 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ (1 − 𝑃𝑖𝑛 − 𝑃𝑐𝑟𝑜𝑠𝑠) to obtain an overall desired firing rate of 𝑟𝑡𝑜𝑡𝑎𝑙.”

      Point 1.2. Many potentially important phenomena seem to be excluded. I realize that no model can be complete, but the choice of which phenomena to include or exclude from this model could bias studies that make use of it and is worth serious discussion. The development of axons is concurrent with dendrite outgrowth, is highly dynamic, and perhaps better understood mechanistically. In this model, the inputs are essentially static. Growing dendrites acquire and lose growth cones that are associated with rapid extension, but these do not seem to be modeled. Postsynaptic firing does not appear to be modeled, which may be critical to activity-dependent plasticity. For example, changes in firing are a potential explanation for the global changes in dendritic pruning that occur following the outgrowth phase.  

      Thanks to the reviewer for bringing up these important considerations. We do indeed write in the Introduction (e.g. lines 36-76) which phenomena we include in the model and why. The Discussion also compares our model to others (lines 433-490), pointing out that most models either focus on activity-independent or activity-dependent phases. We include both, combining the influence of both molecular gradients and growth factors as well as activity-dependent connectivity refinements instructed by spontaneous activity. We consider our model a tractable, minimalist mechanistic model which includes both activity-independent and activity-dependent aspects. 

      Regarding postsynaptic firing, this is indeed super relevant and an important point to consider. In one of our recent publications (Kirchner and Gjorgjieva, 2021), we studied only an activity-dependent model for the organization of synaptic inputs on non-growing dendrites which have a fixed length. There, we considered the effect of postsynaptic firing (via a back-propagating action potential) and demonstrated that it plays an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron. For example, we showed that it could lead to the emergence of retinotopic maps on the dendritic tree which have been found experimentally (Iacaruso et al., 2017). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree. This is now also discussed in the Discussion section of the revised manuscript:

      p. 21 line 491, “Although we did not explicitly model postsynaptic firing, our previous work with static dendrites has shown that it can play an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron (Kirchner and Gjorgjieva, 2021). For example, we showed that it could lead to the emergence of retinotopic maps on the dendritic tree which have been found experimentally (Iacaruso et al., 2017). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree.”

      Including the concurrent development of axons in the model is indeed very interesting. In fact, a recent tour-de-force techniques paper found similar to what we assume. Hebbian activity-dependent dynamics of axonal branches of retinal ganglion cells experiencing spontaneous activity in relation to retinal waves in the superior colliculus (Matsumoto et al., 2024). New branches tend to be added at the locations where spontaneous activity of individual branches is more correlated with retinal waves, whereas asynchronous activity is associated with branch elimination. We suspect the same Hebbian activity-dependent dynamics to apply also to dendritic growth. 

      To address simultaneous dynamic axons to our growing dendrites, in the revised version of the manuscript, we included a simplified form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, which come from axons of presynaptic partners. We explored different median lifetimes of synapses in combination with several distances with which a synapse can move in the simulated space (new Supplementary Figure 3-3). Our results show that dynamically moving synapses only affect the dynamics and stability of our model when the rate of moving synapses combined with the distance of moving synapses is faster than the dendritic growth. In scenarios in which synapses can move across large distances, dendrites get further destabilized due to synapses transferring from one dendrite to another, perturbing the attractor fields of the potential synapses even in late phases of the simulations. Besides such non-biological scenarios, dynamically moving synapses do not affect the model dynamics too much. Thus, they mostly add additional noise and variability to the growth and pruning without changing the timing and amplitude of the dynamics. These results are discussed in the results section of the revised manuscript:

      p.9 line 207, “The development of axons is concurrent with dendritic growth and highly dynamic Matsumoto et al. (2024). To address the impact of simultaneously growing axons, we implemented a simple form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, originating from the axons of presynaptic partners (Figure 3-Figure Supplement 3). When potential synapses can move rapidly (median lifetime of 1.8 hours), the model dynamics are perturbed quite substantially, making it difficult for the dendrites to stabilize completely (Figure 3–Figure Supplement 3c). However, slowly moving potential synapses (median lifetime of 18 hours) still yield comparable results (Figure 3-Figure Supplement 3). The distance of movement significantly influenced results only when potential synaptic lifetimes were short. For extended lifetimes, the moving distance had a minor impact on the dynamics, predominantly affecting the time required for dendrites to stabilize. This was the result of synapses being able to transfer from one dendrite to another, potentially forming new long-lasting connections even at advanced stages of synaptic refinement. In sum, our results show that potential axonal dynamics only affect the stability of our model when these dynamics are much faster than dendritic growth.”

      Precise details on the implementation of the dynamically moving synapses and their synaptic lifetimes are now in the Methods section:

      p. 25 line 650, “Dynamically moving synapses. For the moving synapses we introduced lifetimes for each synapse, randomly sampled from a log-normal distribution with median 1.8h (for when they move frequently), 4.5h or 18h (for when they move rarely) and variance equal to 1 (Figure 3-Figure Supplement 3b). The lifetime of a synapse decreases only when the synapse is not connected to any of the dendrites (i.e., is a potential synapse). When the lifetime of a synapse expires, the synapse moves to a new location with a new lifetime sampled from the same log-normal distribution. This enables synapses to move multiple times throughout a simulation. The exact locations and distances to which each synapse can move are determined by a binary matrix (dimensions: 𝑝𝑖𝑥𝑒𝑙𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 × 𝑝𝑖𝑥𝑒𝑙𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒) representing a ring (annulus) with the inner radius 𝑑/4 and outer radius 𝑑/2 , where the synapse location is at the center of the matrix. All the locations of the matrix within the ring boundaries (between the inner radius and outer radius) are potential locations to which the synapse can move. The synapse then moves randomly to one of the possible locations where no other synapse or dendrite is located. For the movement distances, we chose the ring dimensions 3 × 3, 25 × 25 and 101 × 101, yielding the moving distances (radii) of 1 pixel per movement, 12 pixels per movement and 50 pixels per movement (𝑟 = (𝑑−1)/2). These pixel distances represent small movements, as much as a dendrite can grow in one step (1 micron), and larger movements which are far enough so that the synapse will not attract the same branches again (12 microns) or far enough so that it might attract a completely different dendrite (50 microns, Figure 3-Figure Supplement 3a).”

      Point 1.3. Line 167. There are many ways to include activity -independent and -dependent components into a model and not every such model shows stability. A key feature seems to be that larger arbors result in reduced growth and/or increased retraction, but this could be achieved in many ways (whether activity dependent or not). It's not clear that this result is due to the combination of activity-dependent and independent components in the model, or conceptually why that should be the case.

      We never argued for model uniqueness. There are always going to be many different models (at different spatial and temporal scales, at different levels of abstraction). We can never study all of them and like any modeling study in systems neuroscience we have chosen one model approach and investigated this approach. We do compare the current model to others in the Discussion. If the reviewers have a specific implementation that we should compare our model to as an alternative, we could try, but not if this means doing a completely separate project.

      Point 1.4. Line 183. The explanation of overshoot in terms of the different timescales of synaptic additions versus activity-dependent retractions was not something I had previously encountered and is an interesting proposal. Have these timescales been measured experimentally? To what extent is this a result of fine-tuning of simulation parameters?  

      We found that varying the amount of BDNF controls the timescale of the activity-dependent plasticity (see our Figure 6c). Hence, changing the balance between synaptic additions vs. retractions is already explored in Figure 6e and f. Here we show that the overshoot and retraction does not have to be fine-tuned but may be abolished if there is too much activity-dependent plasticity. 

      Regarding the relative timescales of synaptic additions vs. retractions: since the first is mainly due to activity-independent factors, and the second due to activity-dependent plasticity, the questions is really about the timescales of the latter two. As we write in the Introduction (lines 61-63), manipulating activity-dependent synaptic transmission has been found to not affect morphology but rather the density and specificity of synaptic connections (Ultanir et al. 2007), supporting the sequential model we have (although we do not impose the sequence, as both activity-independent and activitydependent mechanisms are always “on”; but note that activity-dependent plasticity can only operate on synapses that have already formed).

      The described results are robust to parameter variations (performed on the postsynaptic density, potential synapse density, and within- and across-group correlations) as described in the reply to reviewer #1 point 1.1.

      Point 1.5. Line 203. This result seems at odds with results that show only a very weak bias in the tuning distribution of inputs to strongly tuned cortical neurons (e.g. work by Arthur Konnerth's group). This discrepancy should be discussed.  

      First, we note that the correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter, but is rather a property of correlated spontaneous activity. 

      Nonetheless, there is some variability in what the experimental data show. Many studies have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al., 2011; Takahashi et al., 2012; Winnubst et al., 2015; Gökçe et al., 2016; Wilson et al., 2016; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al., 2018; Kerlin et al., 2019; Ju et al., 2020, Hedrick et al., 2022, Hedrick et al., 2024). Interestingly, some in vivo studies have reported lack of fine-scale synaptic organization (Varga et al., 2011; X. Chen et al., 2011; T.-W. Chen et al., 2013; Jia et al., 2010; Jia et al., 2014), while others reported clustering for different stimulus features in different species. For example, dendritic branches in the ferret visual cortex exhibit local clustering of orientation selectivity but do not exhibit global organization of inputs according to spatial location and receptive field properties (Wilson et al. 2016; Scholl et al., 2017). In contrast, synaptic inputs in mouse visual cortex do not cluster locally by orientation, but only by receptive field overlap, and exhibit a global retinotopic organization along the proximal-distal axis (Iacaruso et al., 2017). We proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity similar to the BDNF-proBDNF model that we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021). This is now also discussed in the Discussion section of the revised manuscript:

      p. 20 line 471, “The correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter, but is rather a property of spontaneous activity. Nonetheless, there is some variability in what the experimental data show. Many have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al., 2011; Winnubst et al., 2015; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al., 2018; Takahashi et al., 2012; Gökçe et al., 2016; Wilson et al., 2016; Kerlin et al., 2019; Ju et al., 2020; Hedrick et al., 2022, 2024). Other studies have reported lack of fine-scale synaptic organization (Chen et al., 2013; Varga et al., 2011; Chen et al., 2011; Jia et al., 2010, 2014). Interestingly, some of these discrepancies might be explained by different species showing clustering with respect to different stimulus features (orientation or receptive field overlap) (Scholl et al., 2017; Wilson et al., 2016; Iacaruso et al., 2017). Our prior work proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity as we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021).”

      Point 1.6. Line 268. How does the large variability in the size of the simulated arbors relate to the relatively consistent size of arbors of cortical cells of a given cell type? This variability suggests to me that these simulations could be sensitive to small changes in parameters (e.g. to the density or layout of presynapses).  

      We again thank the reviewer for the detailed explanation and feedback on parameters that should be tested in more detail. We have explored several of the suggested model parameters and believe that we have managed to explain and illustrate their effects on the model's dynamics clearly. The precise changes are explained in the reply to point 1.1 and are now available in the revised version of the manuscript.

      Point 1.7. The modeling of dendrites as two-dimensional will likely limit the usefulness of this model. Many phenomena- such as diffusion, random walks, topological properties, etc - fundamentally differ between two and three dimensions.  

      Indeed, there are many differences between two and three dimensions. We have ongoing work that extends the current model to 3D but is beyond the scope of the current paper. In systems neuroscience, people have found very interesting results making such simplified geometric assumptions about networks, for instance the one-dimensional ring model has been used to uncover fundamental insights about computations even though highly simplified and abstracted. We are convinced that our model, especially with the new sensitivity analysis, makes interesting and novel contributions and predictions.

      Point 1.8. The description of wiring lengths as 'approximately optimal' in this text is problematic. The plotted data show that the wiring lengths are several deviations away from optimal, and the random model is not a valid instantiation of the 2D non-overlapping constraints the authors imposed. A more appropriate null should be considered.  

      We appreciate the reviewer’s feedback regarding the use of the term “approximately optimal” in describing wiring lengths. We acknowledge that our initial terminology was imprecise and could be misleading. We had previously referred to the minimal wiring length as the optimal wiring length, which does not fully capture the nuances of neuronal wiring optimization. As noted in prior literature, such as the work by Hermann Cuntz (Cuntz et al., 2010 & 2012), neurons can optimize their wiring beyond simply minimizing dendritic length.

      To address this issue, to better capture the balance between wiring minimization and functional constraints, such as conduction delays, we have developed a new modeling approach based on minimum spanning trees with a balancing factor (Cuntz et al., 2010 & 2012). This factor modulates the trade-off between minimizing wiring length and accounting for conduction delays from synapses to the soma. Specifically, the model assumes a balance between minimizing the total dendritic length and minimizing the tree distance between synapses and the site of input integration, typically the soma. This balance is illustrated in Figure 8 (Figure 7 in the original manuscript), where we demonstrate that the deviation from the theoretical minimum length arises because direct paths to synapses often require longer dendrites in our models.

      Together with the new result, which we added as the new panels f, g and h to Figure 8 (originally Figure 7), we also adjusted panel a of Figure 8, to now illustrate the difference between random wiring, minimal wiring and minimal conductance delay. The updated Figure 8 and its new findings are discussed in the results section of the revised manuscript:

      p.17 line 387, “This deviation is expected given that real dendrites need to balance their growth processes between minimizing wire while reducing conduction delays. The interplay between these two factors emerges from the need to reduce conduction delays, which requires a direct path length from a given synapse to the soma, consequently increasing the total length of the dendritic cable. (Cuntz et al., 2010, 2012; Ferreira Castro et al., 2020).

      To investigate this further, we compared the scaling relations of the final morphologies of our models with other synthetic dendritic morphologies generated using a previously described minimum spanning tree (MST) based model. The MST model balances the minimization of total dendritic length and the minimization of conduction delays between synapses and the soma. This balance results in deviations from the theoretical minimum length because direct paths to synapses often require longer dendrites (Cuntz et al., 2008, 2010). The balance in the model is modulated by a balancing factor (𝑏𝑓 ). If 𝑏𝑓 is zero, dendritic trees minimize the cable only, and if 𝑏𝑓 is one, they will try to minimize the conduction delays as much as possible. It is important to note that the MST model does not simulate the developmental process of dendritic growth; it is a phenomenological model designed to generate static morphologies that resemble real cells.

      To facilitate the comparison of total lengths between our simulated and MST morphologies, we generated MST models under the same initial conditions (synaptic spatial distribution) as our models and simulated them to match several morphometrics (total length, number of terminals, and surface area) of our grown morphologies. This allowed us to create a corresponding MST tree for each of our synthetic trees. Consequently, we could evaluate whether the branching structures of our models were accurately predicted by minimum spanning trees based on optimal wiring constraints. We found that the best match occurred with a trade-off parameter 𝑏𝑓 = 0.9250 (Figure 8f). Using the morphologies generated by the MST model with the specified trade-off parameter (𝑏𝑓 ), we showed that the square root of the synapse count and the total length (𝐿) in both our model generated trees and the MST trees exhibit a linear scaling relationship (Figure 8g; 𝑅2 = 0.65). The same linear relationship can be observed for the square root of the surface area and the total length 𝐿 of our model trees and the MST trees (Figure 8h; 𝑅2 = 0.73). Overall, these results indicate that our model generate trees are wellfitted by the MST model and follow wire optimization constraints.

      We acknowledge that the value of the balancing factor 𝑏𝑓 in our model is higher than the range of balancing factors that is typically observed in the biological dendritic counterparts, which generally ranges between 0.2 and 0.4 (Cuntz et al., 2012; Ferreira Castro et al., 2020; Baltruschat et al., 2020). However, it is still remarkable that our model, which does not explicitly address these two conservation laws, achieves approximately optimal wiring. Why do we observe such a high 𝑏𝑓 value? We reason that two factors may contribute to this. First, in our models, local branches grow directly to the nearest potential synapse, potentially taking longer routes instead of optimally branching to minimize wiring length (Wen and Chklovskii, 2008). Second, the growth process in our models does not explicitly address the tortuosity of the branches, which can increase the total length of the branches used to connect synapses. In the future, it will be interesting to add constraints that take these factors into account. Taken together, combining activity-independent and -dependent dendrite growth produces morphologies that approximate optimal wiring.”

      Further details on the fitted MST model and the corresponding analysis were added to the methods section:

      p.26 line 669, “Comparison with wiring optimization MST models. To evaluate the wire minimization properties of our model morphologies (n=288), we examined whether the number of connected synapses (N), total length (L), and surface area of the spanning field (S) conformed to the scaling law 𝐿 ≈ 𝜋−1/2 ⋅ 𝑆1/2 ⋅ 𝑁1/2 (Cuntz et al., 2012). Furthermore, to validate that our model dendritic morphologies scale according to optimal wiring principles, we created simplified models of dendritic trees using the MST algorithm with a balancing factor (bf). This balancing factor adjusts between minimizing the total dendritic length and minimizing the tree distance between synapses and the soma (Cost = 𝐿 + 𝑏𝑓 ⋅ 𝑃 𝐿) (MST_tree; best bf = 0.925) (Cuntz et al., 2010); TREES Toolbox http://www.treestoolbox.org).

      Initially, we generated MSTs to connect the same distributed synapses as our models. We performed MST simulations that vary the balancing factor between 𝑏𝑓 = 0 and 𝑏𝑓 = 1 in steps of 0.025 while calculating the morphometric agreement by computing the error (Euclidean distance) between the morphologies of our models and those generated by the MST models. The morphometrics used were total length, number of terminals, and surface area occupied by the synthetic morphologies.”

      Point 1.9. It's not clear to me what the authors are trying to convey by repeatedly labeling this model as 'mechanistic'. The mechanisms implemented in the model are inspired by biological phenomena, but the implementations have little resemblance to the underlying biophysical mechanisms. Overall my impression is that this is a phenomenological model intended to show under what conditions particular patterns are possible. Line 363, describing another model as computational but not mechanistic, was especially unclear to me in this context.  

      What we mean by mechanistic is that we implement equations that model specific mechanisms i.e. we have a set of equations that implement the activity-independent attraction to potential synapses (with parameters such as the density of synapses, their spatial influence, etc) and the activitydependent refinement of synapses (with parameters such as the ratio of BDNF and proBDNF to induce potentiation vs depression, the activity-dependent conversion of one factor to the other, etc). This is a bottom-up approach where we combine multiple elements together to get to neuronal growth and synaptic organization. This approach is in stark contrast to the so-called top-down or normative approaches where the method would involve defining an objective function (e.g. minimal dendritic length) which depends on a set of parameters and then applying a gradient descent or other mathematical optimization technique to get at the parameters that optimize the objective function. This latter approach we would not call mechanistic because it involves an abstract objective function (who could say what a neuron or a circuit should be trying to optimize?) and a mathematical technique for how to optimize the function (we don’t know if neurons can compute gradients of abstract objective functions). 

      Hence our model is mechanistic, but it does operate at a particular level of abstraction/simplification. We don’t model individual ion channels, or biophysics of synaptic plasticity (opening and closing of NMDA channels, accumulation of proteins at synapses, protein synthesis). We do, however, provide a biophysical implementation of the plasticity mechanism through the BDNF/proBDNF model which is more than most models of plasticity achieve, because they typically model a phenomenological STDP or Hebbian rule that just uses activity patterns to potentiate or depress synaptic weights, disregarding how it could be implemented. To the best of our understanding, this is what is normally considered mechanistic in the field (in contrast to, for example, biophysical).

      Reviewer #2 (Public Review): 

      This work combines a model of two-dimensional dendritic growth with attraction and stabilisation by synaptic activity. The authors find that constraining growth models with competition for synaptic inputs produces artificial dendrites that match some key features of real neurons both over development and in terms of final structure. In particular, incorporating distance-dependent competition between synapses of the same dendrite naturally produces distinct phases of dendritic growth (overshoot, pruning, and stabilisation) that are observed biologically and leads to local synaptic organisation with functional relevance. The approach is elegant and well-explained, but makes some significant modelling assumptions that might impact the biological relevance of the results. 

      Strengths: 

      The main strength of the work is the general concept of combining morphological models of growth with synaptic plasticity and stabilisation. This is an interesting way to bridge two distinct areas of neuroscience in a manner that leads to findings that could be significant for both. The modelling of both dendritic growth and distance-dependent synaptic competition is carefully done, constrained by reasonable biological mechanisms, and well-described in the text. The paper also links its findings, for example in terms of phases of dendritic growth or final morphological structure, to known data well. 

      Weaknesses: 

      The major weaknesses of the paper are the simplifying modelling assumptions that are likely to have an impact on the results. These assumptions are not discussed in enough detail in the current version of the paper. 

      (1) Axonal dynamics. 

      A major, and lightly acknowledged, assumption of this paper is that potential synapses, which must come from axons, are fixed in space. This is not realistic for many neural systems, as multiple undifferentiated neurites typically grow from the soma before an axon is specified (Polleux & Snider, 2010). Further, axons are also dynamic structures in early development and, at least in some systems, undergo activity-dependent morphological changes too (O'Leary, 1987; Hall 2000). This paper does not consider the implications of joint pre- and post-synaptic growth and stabilisation.  

      We thank the reviewer for the summary of the strengths and weaknesses of the work. While we feel that including a full model of axonal dynamics is beyond the scope of the current manuscript, some aspects of axonal dynamics can be included and are now implemented and tested in the revised manuscript. Since this feedback covers similar aspects of the model that were also pointed out by reviewer #1, we refer here to our detailed reply to their comments 1.1 and 1.2, where we list and discuss all the analyses performed to address the raised issues.

      (2) Activity correlations 

      On a related note, the synapses in the manuscript display correlated activity, but there is no relationship between the distance between synapses and their correlation. In reality, nearby synapses are far more likely to share the same axon and so display correlated activity. If the input activity is spatially correlated and synaptic plasticity displays distance-dependent competition in the dendrites, there is likely to be a non-trivial interaction between these two features with a major impact on the organisation of synaptic contacts onto each neuron.  

      We have explored the amount of correlation (between and within correlated groups) in the revised manuscript (see also our reply to reviewer comment 1.1).

      However, previous experimental work, (e.g. Kleindienst et al., 2011) has provided anatomical and functional analyses that it is unlikely that the functional synaptic clustering on dendritic branches is the result of individual axons making more than one synapse (see pg. 1019).

      (3) BDNF dynamics 

      The models are quite sensitive to the ratio of BDNF to proBDNF (eg Figure 5c). This ratio is also activity-dependent as synaptic activation converts proBDNF into BDNF. The models assume a fixed ratio that is not affected by synaptic activity. There should at least be more justification for this assumption, as there is likely to be a positive feedback relationship between levels of BDNF and synaptic activation.  

      The reviewer is correct. We used the BDNF-proBDNF model for synaptic plasticity based on our previous work (Kirchner and Gjorgjieva, 2021).  

      There, we explored only the emergence of functionally clustered synapses on static dendrites which do not grow. In the Methods section (Parameters and data fitting) we justify the choice of the ratio of BDNF to proBDNF from published experimental work. We also performed sensitivity analysis (Supplementary Fig. 1) and perturbation simulations (Supplementary Fig. 3), which showed that the ratio is crucial in regulating the overall amount of potentiation and depression of synaptic efficacy, and therefore has a strong impact on the emergence and maintenance of synaptic organization. Since we already performed all this analysis, we expect that the same results will also apply to the current model which includes dendritic growth, as it involves the same activity-dependent mechanism.

      A further weakness is in the discussion of how the final morphologies conform to principles of optimal wiring, which is quite imprecise. 'Optimal wiring' in the sense of dendrites and axons (Cajal, 1895; Chklovskii, 2004; Cuntz et al, 2007, Budd et al, 2010) is not usually synonymous with 'shortest wiring' as implied here. Instead, there is assumed to be a balance between minimising total dendritic length and minimising the tree distance (ie Figure 4c here) between synapses and the site of input integration, typically the soma. The level of this balance gives the deviation from the theoretical minimum length as direct paths to synapses typically require longer dendrites. In the model this is generated by the guidance of dendritic growth directly towards the synaptic targets. The interpretation of the deviation in this results section discussing optimal wiring, with hampered diffusion of signalling molecules, does not seem to be correct. 

      We agree with this comment. We had wrongly used the term “optimal wiring” as neurons can optimize their wiring not only by minimizing their dendritic length but other factors as noted by the reviewer. In the revised manuscript we replaced the term “optimal wiring” with “minimal wiring” wherever it was incorrectly used. On top of that, we performed further analysis and discussed these differences, as pointed out in the reply to reviewer #1 point 1.8.

      To summarize, we want to again thank the reviewer for their in-depth review and all the suggestions that helped us improve the analysis and implementation of our model.

      Reviewer #3 (Public Review): 

      The authors propose a mechanistic model of how the interplay between activity-independent growth and an activity-dependent synaptic strengthening/weaken model influences the dendrite shape, complexity and distribution of synapses. The authors focus on a model for stellate cells, which have multiple dendrites emerging from a soma. The activity independent component is provided by a random pool of presynaptic sites that represent potential synapses and that release a diffusible signal that promotes dendritic growth. Then a spontaneous activity pattern with some correlation structure is imposed at those presynaptic sites. The strength of these synapses follow a learning rule previously proposed by the lab: synapses strengthen when there is correlated firing across multiple sites, and synapses weaken if there is uncorrelated firing with the relative strength of these processes controlled by available levels of BDNF/proBDNF. Once a synapse is weakened below a threshold, the dendrite branch at that site retracts and loses its sensitivity to the growth signal 

      The authors run the simulation and map out how dendrites and synapses evolve and stabilize. They show that dendritic trees growing rapidly and then stabilize by balancing growth and retraction (Figure 2). They also that there is an initial bout of synaptogenesis followed by loss of synapses, reflecting the longer amount of time it takes to weaken a synapse (Figure 3). They analyze how this evolution of dendrites and synapses depends on the correlated firing of synapses (i.e. defined as being in the same "activity group"). They show that in the stabilized phase, synapses that remain connected to a given dendritic branch are likely to be from same activity group (Figure 4). The authors systemically alter the learning rule by changing the available concentration of BDNF, which alters the relative amount of synaptic strengthening, which in turn affects stabilization, density of synapses and interestingly how selective for an activity group one dendrite is (Figure 5). In addition the authors look at how altering the activity-independent factors influences outgrowth (Figure 6). Finally, one of the interesting outcomes is that the resulting dendritic trees represent "optimal wiring" solutions in the sense that dendrites use the shortest distance given the distribution of synapses. They compare this distribute to one published data to see how the model compared to what has been observed experimentally.  

      There are many strengths to this study. The consequence of adding the activity-dependent contribution to models of synapto- and dendritogenesis is novel. There is some exploration of parameters space with the motivation of keeping the parameters as well as the generated outcomes close to anatomical data of real dendrites. The paper is also scholarly in its comparison of this approach to previous generative models. This work represented an important advance to our understanding of how learning rules can contribute to dendrite morphogenesis.

      We thank the reviewer for the positive evaluation of the work and the suggestions below.

      To improve the clarity of the manuscript, we adjusted and fixed some figures and corresponding paragraphs as follows:

      (1) We increased the number of ticks and their corresponding numbers in all the figures to make them easier to read and interpret.

      (2) In Figure 3 panel d, showing the evolution of synaptic weight, we corrected the upper limit at the yaxis to 1 (from previously 2).

      (3) Due to a typo in the implementation of the BDNF concentration, we had to correct the used BDNF concentrations from 49%, 45% and 40%, to 49%, 46.5% and 43% respectively.

      (4) The y-axis labels of Figure 6 (old Figure 5) panel e and f were changed to make the plots clearer (e: “morphology change explained (%)” to "effect on morphology (%)", and f: “synapse connection explained (%)” to "effect on connected synapses (%)").

      (5) The values for the eta and tau-w in the supplementary Table were corrected. Previously tau-w was falsely 6000 time steps which was corrected to 3000 time steps, and eta was 45% and is now 46.5%.

      We believe that all the changes to the manuscript will address the reviewer’s concerns and enhance the clarity and accuracy of the findings described in the manuscript.

    1. Author response:

      We thank the reviewers for their thoughtful comments. We are working to revise our manuscript and address each of the reviewers comments. A summary of our planned revisions and responses to some of the reviewers’ major concerns are included below.

      Cultivation Density: Reviewers #1 and #2 suggested that additional studies testing the effects of varying bacterial density during animal development (cultivation) would strengthen our findings. While we agree with the reviewers that this is a very interesting experiment, it is not feasible. Indeed, we attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. We will focus our revised manuscript to include only assertions about the effects of recent experiences.

      Transfer Method: Reviewers #1 and #2 expressed concern that the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We thank the reviewers for this thoughtful remark and plan to conduct additional analyses to address this hypothesis. We did, however, anticipate this possibility and, to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observe no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick.

      Time Parameter: Related to the transfer method, Reviewer #1 expressed concern that the simplest time parameter (time since start of the assay) might better predict animal behavior. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. We will conduct additional analyses to address these alternative hypotheses.

      Parameter Initialization: Reviewer #1 pointed out an oversight in our methods section regarding the model parameter values used for the first encounter. We plan to clarify the initialization of parameters in the manuscript. In short, for the first patch encounter where k = 1:

      ρk is the relative density of the first patch.

      τs is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρh is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD600 = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρe is equivalent to ρh.

      Sensing vs. non-sensing: Reviewer #3 suggested that the term “non-sensing” may not be ethologically accurate. We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 7C-D and Patch encounter classification as sensing or non-sensing in Methods). Regardless, we agree with the reviewer that all that can be asserted for certain about these events is that animals do not respond to the bacterial patch in any way that we measured. Therefore, we will replace the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events.

      Time-dependent changes in sensing vs. non-sensing: Reviewer #1 remarked that the sensation of dilute patches increases with time. We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we will add this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect.

      Classification of sensing vs. non-sensing: Reviewers #2 and #3 expressed concerns about the validity of the two clusters identified using the semi-supervised QDA approach described. We are grateful to the reviewers for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the supervised labeling. We will use additional visualizations and methods to validate the clusters we have discovered. Specifically, we aim to provide additional evidence that the sensing vs. nonsensing data is bi-modal (i.e. a two-cluster classification method fits best). Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit) that we plan to clarify in the manuscript. Specifically, it’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (to be changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-sensing exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing). We will clarify this in the text. Additionally, we will clarify the labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there were no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD600 = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification.

      Accept-reject vs. stay-switch: Reviewers #1 and #2 ask for additional discussion on how the accept-reject decision-making framework differs from the stay-switch framework. We thank the reviewers for alerting us to this gap in our discussion. We intend to clarify that these frameworks ask two different types of questions (i.e. “Do you want to eat it?” versus “If so, how long do you want to eat it for?”). These concepts are well described in canonical foraging theory literature (see Pyke, Pulliam & Charnov 1977 for a review on the subject) and are easily distinguishable for animals that forage using the following framework: 1) search for prey, 2) encounter prey from a distance, 3) identify prey type, 4) decide to pursue (accept-reject decision), 5) pursue and capture the prey, 6) exploit prey, and 7) decide to stop exploiting and start searching again (stay-switch decision). In this case, it is easy to see the distinction between accept-reject and stay-switch decisions. However, in some scenarios, animals must physically encounter prey prior to identification and then must make an accept-reject decision. In these cases where pursuit and capture are not visualized, it is harder to distinguish between accept-reject and stay-switch decisions. In our experiments, we find significant bimodality in encounter duration (see Figure 2H) where short duration (exploratory) encounters appear to represent a lower bound where animals spend the minimum amount of time possible on a patch (less than 2 minutes), which we interpret as a rejection of the patch. On the other hand, exploitatory encounters span a large range of durations from 2 to 60+ minutes which we interpret as an initial acceptance of the patch followed by a series of stay-switch decisions which determine the overall duration of the encounter. While one could certainly model our data using only stay-switch decision-making, we ascertain that an encounter of minimal duration is better interpreted ethologically as a rejection than as an immediate switch decision. We will revise the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior.

      Sensory mutant behavior: Reviewers #1 and #3 ask for further speculation on the observed behavior of osm-6 and mec-4 animals. We will further elaborate on our findings, how they relate to previous studies, and what they suggest about the mechanisms behind these foraging decisions.

      Model design: Reviewer #3 suggested several alterations to the behavioral model. While the proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making, we chose the present model based on our experience with model selection using these data. Indeed, as the reviewer suggested, we did a great number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets). We found that the problem of model selection was compounded by the enormous array of highly correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design is therefore beyond the scope of this study. Lastly, Reviewer #3 criticized the use of only sensed patches in the model. While we acknowledge that we are not certain as to whether the “non-sensing” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. In fact, when all encounters are used, we find stronger correlations between our task variables and the accept-reject decision. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events.

    1. Author response:

      First of all, I'd like to express my heartfelt thanks to you for your meticulous and professional review comments. Your feedback is very important to our work. It not only helps us identify the shortcomings in the paper, but also provides valuable guidance for improving the quality of the paper.

      We carefully read every suggestion you made and were deeply inspired. Please rest assured that we will carefully consider and revise each opinion to ensure that our research work is more rigorous and clear. We promise to revise the manuscript accordingly to meet the standards of the journal and enhance the credibility and influence of the research.

      The main modifications include the experiment of A Mid1 supplementation experiment in Mid1 knockout micesupplementing Mid1 in Mid1 knockout mice; Detection of kinases such as CaMKII, PKA and ERK1/2; Supplementary references; Supplement the behavioral experiment of new object recognition; Electrophysiological measurement experiment of supplementing LTP; Supplementary neuron-specific immunohistochemical staining experiment; Supplementing the information of knockout mice used in the study; Modify the language expression of the article and the problem of too few pictures.

      Thank you again for your valuable time and professional advice. We look forward to submitting the revised manuscript to you for further review.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Cesar, Santos & Cogni use a meta-analysis to report on the direction and magnitude of three fundamental fitness components in defensive symbioses. Specifically, the work focuses on interactions between three arthropod host families (Aphididae, Culicidae, Drosophilidae, and others) and common bacterial endosymbionts (Wolbachia, Serratia, Hamiltonella, Spiroplasma, Rickettsia, Regiella X-type and Arsenophonus). The results of the overall analysis confirm common assumptions and previous work on such fitness components, showing that defensive symbionts provide strong protection to hosts and cause detectable costs to both hosts and the enemy. The analysis provides insight into the extent of the cost/benefit tradeoff for hosts, reporting that the cost is six times lower than the protective effect. The confirmation that natural enemies attacking hosts infected with symbionts have a reduction in their fitness is also an interesting one, as this shows that the majority of defensive symbionts provide protection by resisting enemy infection, as opposed to tolerating it. This finding has important consequences for evolutionary counter-responses in the enemy species. Of course, this result has less relevance for certain types of enemies (such as parasitoids) where successful infection is dependent upon host killing.

      Interesting results also emerge from the subgroup analysis. For the full dataset, both natural and introduced symbionts were similarly effective in positively influencing the fitness of hosts. However, in the Wolbachia-specific analysis, the artificially introduced symbionts caused costs to the hosts where the natural strain did not. These findings have potentially important ramifications for schemes that use endosymbionts for biocontrol or vector competence, suggesting that (in some cases) natural strains may be the more stable choice for deploying (as they are associated with lower costs).

      The analysis draws from an impressively large dataset, but the interpretation of the full impact of the results would be helped by greater detail on the species/strain level systems included, the data extraction approach, and inclusion criteria. Accounting for phylogenetic nonindependence and alternative coding of one of the moderator variables could also strengthen the biological relevance of the models. Suggestions and thoughts are outlined below.

      We sincerely thank Reviewer #1 for the time and effort dedicated to reviewing our manuscript. The suggestions provided are highly constructive and will greatly assist us in improving both our analyses and the manuscript overall.

      Strengths & Potential Improvements:

      An impressively large number of effect sizes (3000) from only 226 studies is collected, robustly confirming common assumptions on the magnitude of fundamental fitness components. However the paper would benefit from a clear breakdown in the main text of the specificities of each system included (e.g. a table at the host species/symbiont strain level, where it is possible). Currently, there is not enough detail for those who want a deep dive to understand what data was extracted for the analysis from these 226 studies, or those who want to understand the underlying diversity in the dataset.

      We thank the reviewer for the suggestion, and we will add this information to our revised manuscript.

      Currently, when the 'natural enemy group' is tested as a moderator it is coded broadly by type of organism (e.g. virus, bacterium, fungi, parasitoid). But this doesn't adequately capture the mode of killing/fitness reduction by the enemy, which would be the much more biologically relevant categorisation for your questions. For example, parasitoid infection is dependent upon host death (thus host fecundity is not relevant, because the host either survived or did not). Among bacterial and viral pathogens antagonists there is scope for both fecundity and survival to be affected. This in turn may be a very influential factor for the outcome. You could consider recoding this enemy moderator.

      We agree, and we will implement this in the analysis to our revised manuscript.

      The analysis is restricted to arthropod hosts and defensive symbionts that are also classed as endosymbionts. This focus should be made clear early on in the paper, as there are many systems (that are classed by many as defensive symbioses) that are not part of the analysis.

      We agree, and we will implement this to our revised manuscript.

      There is fairly minimalistic testing of moderators/sub-groups (which probably has its statistical strengths) but perhaps there are also some missed opportunities for testing other ecological contributors to variance, including coinfection (although perhaps limited by power) and other approaches to coding enemy group (as detail above).

      We agree, and we will implement this in the analysis to our revised manuscript.

      Looking at the overview of systems included, there's likely a high degree of phylogenetic non-independence in the dataset. Where it is possible, using phylogenetically controlled models could strengthen this analysis.

      We thank the reviewer for the suggestion. We will explore the possibility of using phylogenetically controlled models in our analyses, although we recognize the challenges associated with their implementation, particularly in the case of the natural enemies, given the great diversity of distant related groups included in our study - viruses, bacteria, fungi, protozoans, nematodes and parasitoids wasps.

      Looking at your included systems (Table S5), you might be able to test the effect of coinfection on the 3 variables of interest. For example, it would be particularly important to see if the effects of two symbionts are additive or not.

      We agree, and we will implement this in the analysis to our revised manuscript.

      No code for the analysis is provided for review at this stage and full details of the dataset are also not available. This slightly limits the ability to assess the full scope and robustness of the study. It would be helpful to have an extensive table in the supplementary detailing (minimum) the reference, study, experiment, host species, symbiont strain, and a description of the exact data extraction source (e.g.table/figure/in text), and method of extraction.

      The code for the analysis and the full raw data with the suggested information are available at https://github.com/cassiasqr/MetaSymbiont (The link is available at the end of the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In this exciting study, Cesar and co-authors perform a meta-analysis on the influence of arthropod symbionts on the fitness of their hosts when they are exposed or not to natural enemies. These so-called defensive symbionts are increasingly recognized as key elements in arthropod survival against natural enemies, with effects that ripple through entire terrestrial ecosystems. The topic is timely, the approach is sound, and the manuscript is well-written. I believe this manuscript will attract the attention of entomologists and of microbiologists interested in symbiosis. This study builds on a previous meta-analysis that I was involved in, which was based on phloem-feeding insects. This novel data set is much larger and includes flies (including the model system Drosophila) and mosquitoes (a group of high medical interest). While the previous metaanalysis considered only parasitoids as natural enemies, this study also includes fungi, bacteria, and viruses.

      Strengths:

      The authors compile a very large dataset and provide a broad quantitative overview of the effects of defensive symbionts in insects. By measuring symbiont effects in the presence and absence of natural enemies, the authors are able to infer whether a trade-off between defense and the costs of mutualism in the absence of enemy pressure exists. Defensive symbioses are an important research topic that had its initial "momentum" a decade ago, so the timing for such a systematic review is very appropriate.

      We sincerely thank Reviewer #2 for dedicating their time and effort to reviewing our manuscript. The suggestions are very insightful and will significantly contribute to improving our manuscript.

      Weaknesses:

      I think the manuscript could be improved by clarifying several sections, particularly the introduction and methods. The introduction section is too specific and heavily reliant on particular examples. In my view, the theoretical background of the study could be made clearer, and the knowledge gap identified more explicitly. A focus on how widespread defensive symbioses are, along with a brief, up-to-date review of the groups possessing such symbionts, would help. This lack of focus is also observed in the methods section, where more details are needed in many instances to better understand how data was collected and analyzed. Regarding the analyses, the multi-level analysis contains many moderators, but it's unclear why these moderators were included. While this may seem a minor issue, it highlights a disconnection between the analyses, the conceptual background, and the hypotheses tested. 

      We thank the reviewer for the suggestions, and we will try to make the introduction and the methods section clearer. 

      Another important weakness is that the analyses are too general, and much-hidden information is not immediately apparent. For instance, readers cannot easily identify which species of symbionts are studied (and the effects they have), or which natural enemies are involved. Although this information is found in the supplementary material, including it in the main body would significantly improve the manuscript.

      We agree, and we will implement this to our   revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The technology requires a halo-tagged derivation of the active compound, and the linked position will have a huge impact on the potential "target hits" of the molecules. Given the fact that most of the active molecules lack of structure-activity relationship information, it is very challenging to identify the optimal position of the halo tag linkage.

      We appreciate your insightful comment. While finding the optimal position to attach a chemical linker to a small molecule of interest is indeed a challenging but necessary step, this is a common difficulty across all target-ID methods, except for those that are modification-free, as we described in Discussion. However, modification-free approaches such as DARTS, CETSA, and TPP have their own limitations, such as low sensitivity and a high false-positive rate. Additionally, DARTS and SPROX are limited to use with cell lysates. Please refer to the introduction in our manuscript for more details on these approaches. On the other hand, synthesizing HTL derivatives is relatively straightforward compared to other modifications, and we provide helpful guidelines for chemical linker design, provided the optimal chemical moiety has been identified, which is crucial for target identification. We selected dasatinib and HCQ/CQ as model compounds because previous studies offered insights into their derivative synthesis. Our data also show that DH5 retains strong kinase inhibitory activity (Figure 4—figure supplement 2), and DC661-H1 demonstrates potent inhibition of autophagy (Figure 6—figure supplement 1). For novel compounds, conducting a thorough structure-activity relationship (SAR) study is essential to determine the optimal position for HTL derivative synthesis.

      (2) Although POST-IT works in zebrafish embryos, there is still a long way to go for the broad application of the technology in other animal models.

      Thank you for your constructive comment. Yes, there is still a long way to go in developing the POST-IT system for broader applications in other animal models, especially in mice. However, we hope that our study provides valuable insights and inspiration to scientists and experts for applying the POST-IT system in various models. We are also committed to further improving its applicability.

      (3) The authors identified SEPHS2 as a new potential target of dasatinib and further validated the direct binding of dasatinib with this protein. However, considering the super strong activity of dasatinib against c-Src (sub nanomolar IC50 value), it is hard to conclude the contribution of SEPHS2 binding (micromolar potency) to its antitumor activity.

      Thank you for your insightful comment. We agree that the anticancer activity of dasatinib primarily results from inhibiting tyrosine kinases such as SRC and ABL. However, SEPHS2 contains an “opal" termination codon, UGA, at the 60th amino acid residue, which codes for selenocysteine. Due to the technical challenge of expressing selenoproteins in E. coli, we mutated it to cysteine for expression in E. coli to avoid premature translation termination, as described in the Materials and Methods section. Although the purified recombinant SEPHS2 shows a Kd of about 10 µM for dasatinib, the binding affinity to endogenous SEPHS2 may be higher since selenocysteine is larger and more electronegative than cysteine. This presents an interesting area for future investigation. Furthermore, our study of dasatinib’s binding to SEPHS2 could help facilitate the development of new SEPHS2 inhibitors, potentially targeting the active site of SEPHS2.

      Reviewer #3 (Public review):

      (1) Target Specificity: It is crucial for the authors to differentiate between the primary targets of the POST-IT system and those identified as side effects. This distinction is essential for assessing the specificity and utility of the technology.

      Thank you for your insightful comment. Drugs inevitably bind to various proteins with differing affinities, which can contribute to both side effects and beneficial outcomes. Typically, the primary targets exhibit high affinities. In this manuscript, we ranked the identified protein targets of DH5 based on affinity from mass spectrometry and p-values (Fig. 5A), and for DC661-H1, we used the SILAC ratio (Fig. 6A). We also individually assessed many drug-protein binding affinities using the MST assay, as well as in vitro and in cellulo assays, demonstrating their specificity. Moreover, we believe it is essential to identify as many protein targets as possible at physiological drug concentrations to better understand the drug’s side effects. Of course, further investigation is required to assess the roles and effects of these target proteins.

      (2) In Vivo Target Identification: The manuscript lacks detailed clarity on which specific targets were successfully identified in the in vivo experiments. Expanding on this information would provide a clearer view of the system's effectiveness and scope in complex biological settings.

      Thank you for your insightful comment regarding in vivo target identification. In this manuscript, we utilized a cell line as the primary method for in vivo target identification and validation after optimizing our system in test tubes. We successfully validated many of the targets identified using our POST-IT system (Figure 6—figure supplement 3). To demonstrate the proof of principle for in vivo application, we employed zebrafish embryos as an in vivo model, showing that endogenous SRC can be effectively pulled down by DH5 treatment (Fig. 7). While we could have explored the entire proteome to identify endogenous target proteins in zebrafish that bind to DH5 or dasatinib, we felt this would extend beyond our original scope, given that we have already demonstrated POST-IT’s ability to identify target proteins for dasatinib. Specific target identification and validation are crucial when using zebrafish for drug discovery. Additionally, we acknowledge that drugs likely interact with a range of protein targets in living organisms and may undergo metabolism and interactions within the circulatory system, which we address in our discussion.

      (3) Reproducibility and Scalability: Discussion on the reproducibility of the POST-IT system across various experimental setups and biological models, as well as its scalability for larger-scale drug discovery programs, would be beneficial.

      Thank you for the suggestion. While our system has shown  high reproducibility in our experiments, further improving both reproducibility and scalability would be advantageous. One potential approach to address this is through the generation of stable-expressing cell lines and transgenic zebrafish lines, which we have discussed in the revised manuscript. Establishing stable cell lines with robust POST-IT expression could enhance scalability for drug discovery applications.

      (4) Quantitative Analysis: A more detailed quantitative analysis of the protein interactions identified by POST-IT, including statistical significance and comparative data against other technologies, would enhance the manuscript.

      Thank you for your suggestion. In our assessment of drug-protein affinity, we included Kd values as quantitative measures using MST assays. The protein targets of dasatinib identified through mass spectrometry are also accompanied by p-values for quantitative analysis (Fig. 5A), and the detailed procedures are described in the Material and methods section. While it is challenging to provide direct comparative data against other technologies, our system successfully identified many known target proteins for dasatinib, as well as SEPHS2 and VPS37C as new targets for dasatinib and for HCQ/CQ, respectively, which were not detected by other methods.

      (5) Technological Limitations: The authors should discuss any limitations or potential pitfalls of the POST-IT system, which would be crucial for future users and for guiding subsequent improvements.

      Thank you for your insightful suggestion We agree that clearly defining the technological limitations is important. Therefore, we have expanded our original discussion on the limitations of our POST-IT system (Discussion section, paragraph 6).

      (6) Long-Term Stability and Activity: Information on the long-term stability and activity of the POST-IT components in different biological environments would ensure the reliability of the system in prolonged experiments.

      Yes, this is an important question. We did not notice any stability or toxicity issues with Halo-PafA and Pup substrates in HEK293T cells or zebrafish, which is an important factor for stable cell lines and transgenic zebrafish lines. However, HTL derivatives of the drug could be toxic or unstable due to the nature of the drug or its metabolism, which needs to be taken into account when designing experiments, and we have included this in the Discussion.

      (7) Comparison with Existing Technologies: A detailed comparison with existing proximity tagging and target identification technologies would help position POST-IT within the current landscape, highlighting its unique advantages and potential drawbacks.

      We appreciate your valuable feedback and agree that such comparisons are crucial. We have included a detailed overview and comparison of existing proximity-tagging systems and their related target identification technologies in the Introduction (lines 78-100) and Discussion (lines 391-412), highlighting their respective pros and cons. Additionally, we have expanded the discussion to further compare these technologies with our POST-IT system, addressing its advantages and limitations (lines 378-390, lines 448-467). We hope this provides sufficient context and information to effectively position POST-IT among the landscape of proximity-tagging target identification technologies.

      (8) Concerns Regarding Overexposed Bands: Several figures in the manuscript, specifically Figure 3A, 3B, 3C, 3F, 3G, Figure 4D, and the second panels in Figure 7C as well as some figures in the supplementary file, exhibit overexposed bands.

      We appreciate your astute observation regarding the overexposed bands and apologize for any confusion. The “overexposed” bands represent the unpupylated proteins, while the bands above them correspond to the pupylated proteins. We intended to clearly show both pupylated and unpupylated bands, although the latter are generally much weaker. We are currently working on further improving our POST-IT system to enhance pupylation efficiency.

      (9) Innovation Concern: There is a previous paper describing a similar approach: Liu Q, Zheng J, Sun W, Huo Y, Zhang L, Hao P, Wang H, Zhuang M. A proximity-tagging system to identify membrane protein-protein interactions. Nat Methods. 2018 Sep;15(9):715-722. doi: 10.1038/s41592-018-0100-5. Epub 2018 Aug 13. PMID: 30104635. It is crucial to explicitly address the novel aspects of POST-IT in contrast to this earlier work.

      Thank you for bringing this to our attention. Proximity-tagging systems like BioID, TurboID, NEDDylator, and PafA (Lui Q et al., Nat Methods 2018) were initially developed to study protein-protein interactions or identify protein interactomes, as these applications are of broader interest and generally easier to implement. However, applying proximity-tagging systems for small molecule target identification requires significant optimization. As described in the introduction (lines 78-100), target protein identification systems have since been developed using TurboID and NEDDylator (Tao AJ et al., Nat Commun 2023; Hill ZB et al., J Am Chem Soc 2016). It is conceivable that a PafA-based proximity-tagging system could also be adapted for target-ID, and other groups may pursue this approach in the future. Although the PafA-Pup system shows great promise for target-ID applications, extensive optimization was needed to enable its use for this purpose. Finally, we demonstrate that POST-IT offers distinct advantages over other proximity-tagging-based target-ID systems. For more details, please refer to the introduction and discussion sections.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1- Figure Supplement 1A: The Pup substrate "HB-Pup" is mentioned, but the main text or figure legend provides no introduction or description.

      We appreciate your astute observation. We have added a description in the main text and figure legend as follows: “…and used HB-Pup as a control, which contains 6´His and BCCP at the N terminus of Pup” in the main text (line 142) and “HB, TS, and SBP refer to 6´His and BCCP, twin-STII (Strep-tag II), and streptavidin binding peptide, respectively.” in the Figure 1-figure supplement 1A.

      (2) Figure 1 - Figure Supplement 3B: The authors used TS-sPupK61R as a substrate but did not explain why. The main text mentions that mutating sPup alone did not affect polypupylation, raising the question of why TS-sPupK61R was used in this figure. Furthermore, while the authors state that polypupylation becomes evident after 1 hour of incubation (more pronounced after 2 or 3 hours), the reactions here were conducted for only 30 minutes.

      Thank you for your question. Figure 1 - Figure Supplement 3B was conducted to test self-pupylation levels in the different Halo-PafA derivatives. For this purpose, we could use any Pup substrate such as SBP-sPup and SBPK4R-sPupK61R, instead of Ts-sPup and TS-sPupK61R, as they do not show any differences in pupylation activity. We chose Ts-sPup and TS-sPupK61R simply because any Pup substrates could be used for this purpose. Similarly, we did not need to incubate the reaction for a longer time to detect polypupylation, as our intention was to test “self-pupylation”. We demonstrated in Figure 1 – figure supplement 2 that polypupylation is dependent on the number or position of lysine residues in Pup substrate or tags. The results clearly showed that self-pupylation was almost completely abolished by the Halo8KR mutation. To clarify this, we added the following description in lines 168-169: “Ts-sPup and TS-sPupK61R were chosen as sPup substrates for this experiment, although any Pup substrates could have been used. The levels of self-pupylation were assessed.”

      (3) Line 156: The statement that "the TS-tag completely abolished polypupylation in TS-sPup" is inaccurate. Using TSK8R-sPupK61R as the substrate, several bands appear, which likely represent Halo-PafA with varying degrees of polypupylation. Some bands also appear to correspond to those seen when using TS-sPup as a substrate. The authors should clarify how they distinguish between multipupylation and polypupylation in this case.

      We sincerely appreciate your insight into clarifying the distinction between multipupylation and polypupylation. Polypupylation refers to the addition of a new Pup onto a previously linked Pup on the target protein, akin to polyubiquitination. In contrast, multipupylation involves multiple single pupylations at different positions on the target proteins. Since pupylation occurs exclusively at lysine residues in tag-Pup substrates, mutating all lysine residues to arginine, as in TSK48R-sPupK61R, prevents the mutant tag-Pup from linking to another Pup. This means that only single pupylation can proceed with this type of mutant Pup substrate. If multiple pupylated bands are observed with this mutant substrate, it indicates “multipupylation” rather than “polypupylation”, as shown in Figure 1-figure supplement 2D. The same applies to the pupylation bands in Figure 1-figure supplement 2E and F, as sSBP-sPupK61R and SBPK4R-sPupK61R lack lysine residues. By comparing these multipupylation bands, it is also possible to distinguish them from polypupylation bands, which are marked by yellow arrows. However, after 2-3 pupylation bands, higher-order bands become increasingly difficult to distinguish.

      To clarify the mutation in the TS-tag, we revised the sentence in line 156 from “However, further mutations within the TS-tag completely abolished polypupylation in TS-sPup” to “However, further mutations of two lysine residues within the TS-tag, creating TSK8R-sPupK61R, completely abolished polypupylation in TS-sPup”. Additionally, we have inserted sentences in line 152 to define polypupylation and multipupylation, as described here.

      (4) Line 160: Similar to the above concern about line 156, the claim that SBPK4R and sSBP completely prevented polypupylation is unconvincing and requires more supporting evidence.

      Thank you for raising this concern. As mentioned above, both SBPK4R and sSBP lack lysine residues required for pupylation. As a result, these mutants can only undergo multiple single pupylations on the lysine residues of the target protein, which leads to “multipupylation”. In Figure 1-figure supplement 2E and F, pupylation bands by sSBP-sPupK61R or SBPK4R-sPupK61R do not display doublet bands (one from multipupylation and the other from polypupylation), as seen with SBP-sPup, marked by yellow arrows. Notably, Halo-PafA containing polypupylated branches migrates more slowly than one with an equal number of multipupylation events. To clarify this point, we have added the phrase “as shown in sSBP-sPupK61R and SBP4KR-sPupK61R” at the end of the sentence in line 160.

      (5) Lines 176-177: The authors claim that PafAS126A exhibited reduced polypupylation compared to PafA, but given that PafAS126A may reduce depupylase activity, how could it reduce polypupylation levels? Moreover, it is hard to find any data supporting this conclusion in Figure 1 - Figure Supplement 3B.

      We appreciate your insightful comment. At this point, we do not fully understand how the mutation that reduces depupylase activity also decreases polypupylation. It is possible that PafAS126A has a lower preference for pupylated Pup as a prey, which is required for polypupylation, since depupylase activity depends on recognizing pupylated Pup as a prey to remove it. Nonetheless, Halo-PafAS126A shows reduced levels of higher molecular weight bands compared to Halo-PafA, as shown in Figure 1-figure supplement 3B, while exhibiting increased pupylation in lower molecular weight bands, which represent either multipupylation or low-degree polypupylation. Since higher molecular weight bands (> 150 kD) are likely due to polypupylation, this result suggests reduced polypupylation and increased multipupylation in Halo-PafAS126A. To clarify this in the main text, we have added the following description in line 177: “as evidenced by the decreased levels of high molecular weight bands and an increase in low molecular weight bands”

      (6) POST-IT system in cellulo validation: The system was developed using the Halo-tag, yet the in-cell validation uses FRB and FKBP instead, without explaining this switch. This inconsistency makes the logic of the experiment unclear.

      We appreciate your insightful comment. The interaction between rapamycin and FRB or FKBP is known to be highly specific and robust, making this system useful in various biological contexts. Due to this property, rapamycin can induce interaction between two proteins when one is fused with FRB and the other with FKBP. Before testing or optimizing the POST-IT system in cells, we hypothesized that using the rapamycin-induced interaction between FRB and FKBP could introduce pupylation of the target protein, provided that PafA is fused with FRB or FKBP and the target protein is fused with the other. The results demonstrate that PafA can introduce pupylation of the target protein in a proximity-dependent manner via this chemically induced interaction. To further clarify this in the main text, we modified the original sentence in lines 214-216 as follows: “To mimic drug-target interaction-induced pupylation in live cells and assess the potential of PafA as a proximity-tagging system for target-ID, we incorporated the rapamycin-induced interaction between FRB and FKBP into our PL system, as this interaction between a small molecule and a protein is known to be highly specific and robust (Figure 3—figure supplement 1A).”

      (7) Line 209: The authors decided to use the SBP-tag for further studies due to better performance, but in Figure 3 - Figure supplement 1, they still used the unintroduced HB-Pup as the substrate, which is confusing and lacks explanation.

      Thank you for raising your question. The SBP-tag is not superior to the TS-tag in terms of pupylation activity. However, the TSK8R mutant cannot bind to Strep-Tactin beads, while the SBP mutants, SBPK4R and sSBP, can bind to streptavidin. Therefore, we chose the SBP-tag instead of the TS-tag for further studies as a Pup substrate in POST-IT system, as we needed to pull down the target proteins. HB-Pup is consistently used as a control throughout various experiments, as it is the original Pup substrate. In Figure 3-figure supplement 1B and C, HB-Pup was used to test chemically induced pupylation by PafA. In these cases, it was not so critical which Pup substrate was chosen. Furthermore, we compared HB-Pup and different SBP-sPup substrates in Figure 3-figure supplement 1D, where HB-Pup was used as a control or for comparison. Although pupylation bands with HB-Pup appear more robust, this substrate contains multiple lysine residues, leading to high levels of polypupylation. To make it clear, we modified the sentence in line 209 to “Therefore, we decided to use the SBP-tag as a Pup substrate in the POST-IT system for further studies.”.

      (8) Line 220: Both SBP-sPup and SBPK4R-sPupK61R are described as exhibiting efficient pupylation, but the data show mostly self-pupylation and little to no pupylation of the target protein.

      Thank you for your concern. However, pupylation of the target protein is actually quite substantial, as the intensities of the free form and pupylated proteins are relatively similar, as shown in the upper panel of Figure 3-figure supplement 1D. Self-pupylation is always much higher than target pupylation, because PafA constantly pupylates itself, whereas pupylation of the target protein occurs only through interaction. Furthermore, V5-FRB-mKate2-PafA contains many lysine residues, which increases the levels of self-pupylation.

      (9) Lines 222-224: The authors chose SBPK4R-sPupK61R to avoid polypupylation, although SBP-sPup did not cause detectable polypupylation. Neither substrate caused pupylation of the target protein, so the rationale behind this choice is unclear.

      Thank you for raising your question. Similar to the above comment (#8), please refer to the pupylation bands of the target protein, as shown in the upper panel of Figure 3-figure supplement 1D. The pupylation band of the target protein is quite remarkable, as the intensities of the free form and pupylated proteins are comparable. Additionally, there are no multiple pupylation bands in either case, except for one additional weak multipupylation band, indicating no polypupylation by SBP-sPup, which does not have K-to-R mutations. Of course, SBPK4R-sPupK61R can only undergo single pupylation, as it does not contain lysine residues. Although we did not observe polypupylation by SBP-sPup in this experimental condition, it is possible that SBP-sPup may cause polypupylation under different experimental conditions or with other target proteins. Since SBPK4R-sPupK61R exhibits comparable pupylation of the target protein at least in this experiment setting as SBP-sPup, we selected SBPK4R-sPupK61R as the Pup substrate for POST-IT system to avoid any potential polypupylation that could be caused by SBP-sPup in other cases. We believe that polypupylation can introduce bias into the analysis and hinder the comprehensive discovery of additional target proteins for small molecules.

      (10) Line 224: The authors conclude that rapamycin greatly reduced self-pupylation, but the supporting data are unclear.

      Thank you for your constructive comments on our manuscript. Please refer to the lower panel of Figure 3-figure supplement 1D. When using either SBPK4R-sPupK61R or SBP-sPup, rapamycin treatment results in reduced levels of self-pupylation compared to the no-treatment control. However, we did not observe this reduction with HB-Pup and do not know the reason. To clarify this in the main text, we added the following description to the end of the sentence: “when using either SBPK4R-sPupK61R or SBP-sPup, as shown in the lower panel of Figure 3—figure supplement 1D”

      (11) Line 234: The authors selected an 18-amino acid linker, but given that linkers longer than 10 amino acids enhance labeling, this choice should be explained.

      Thank you for raising your question. In fact, a linker of 10 amino acids (aa) or longer is likely to behave similarly. We chose an 18 aa linker instead of a 40 aa linker primarily for the convenience of cloning and to reduce the potential for DNA sequence recombination associated with longer repeats. Additionally, a longer, flexible linker may behave like an intrinsically disordered protein (Harmon et al., 2017), which can lead to unwanted protein-protein interactions or phase separation. To elaborate on this, we added the following sentences after the sentence in line 233-235: “We chose the 18-amino acid linker instead of the 40-amino acid linker for easier cloning and to lower the risk of DNA recombination from longer repeats. Additionally, a longer, flexible linker may behave like an intrinsically disordered protein (Harmon et al., 2017), an unwanted feature for target-ID.”

      (12) S126A and K172R mutations: The authors claim that these mutations additively enhanced pupylation under cellular conditions, but in Figure 3B, the band intensities appear similar for the wild-type and mutant versions.

      Thank you for raising your concern. Although a single pupylation band appears similar among the three different Halo-PafA proteins, multipupylation bands are slightly but noticeably increased by the S126A and K172R mutations compared to Halo8KR-PafA. Since we used SBPK4R-sPupK61R as a Pup substrate, all higher molecular weight bands result from multipupylation rather than polypupylation. This illustrates why it is preferable to use SBPK4R-sPupK61R over SBP-sPup, as the pupylation bands with SBP-sPup are mixtures of poly- and multipupylation, making it difficult to assess levels of target labeling. To clarify this in the main text, we added the following description after the sentence in line 236: “as the higher molecular weight multipupylation bands are slightly but noticeably increased with these mutations compared to Halo8KR-PafA”

      (13) Line 263: The authors selected DH5 for further experiments due to its efficiency, but the data suggest that the performance of DH1 to DH5 is similar.

      We appreciate your question about the different dasatinib HTL derivatives. However, our data clearly show that DH2-5 derivatives bind significantly more effectively to Halo-PafA in vitro and in live cells compared to DH1 (Figure 4A and B). Additionally, the DH2-5 derivatives result in dramatically increased pupylation of the target protein in vitro and noticeable enhancement in live cells (Figure 4C and D). Among DH2 to DH5, there is no obvious difference in binding to Halo-PafA or pupylation of the target protein. Therefore, we chose DH5, as we believe that the longer linker in DH5 may facilitate the binding of a more diverse range of target proteins to dasatinib, enabling the discovery of additional target proteins.

      (14) Line 309: The authors introduce HCQ and CQ as important drugs but then investigate the mechanism using DC661 without introducing or justifying the choice of this compound.

      Thank you for your point. We explained the reason to choose DC661, a dimer form of CQ, instead of CQ for the synthesis of an HTL derivative in line 310. “assuming that a dimer would enhance binding affinity as previously described.” As the dimer forms of a drug or a small molecule such as testosterone dimers, estrogen dimers, and numerous anticancer drug dimers have been often developed to enhance drug effects (Paquin A et., Molecules 2021). Similarly, dimer forms of HCQ/CQ have been introduced and shown to be more potent (Hrycyna CA et al., ACS Chem Biol 2014; Rebecca VW et al., Cancer Discovery 2019). We expected that using a dimer form might offer higher probability to identify target proteins for HCQ/CQ.

      (15) The authors suggest that multipupylation levels were enhanced but do not explain whether this might benefit the system or introduce other issues. Clarifying this point would provide valuable insight for potential users of this system.

      Thank you for your thoughtful suggestion. Polypupylation likely leads to biased enrichment of a limited set of target proteins, and its levels may not correlate with the binding affinity of target proteins to the small molecule of interest, features that can negatively impact target-ID. In contrast, multipupylation may be correlated with binding affinity or interaction frequency, as we observed increased levels of multipupylation with higher Pup concentrations and longer incubation times. This suggests that target proteins with multiple lysines in proximity to PafA can be sequentially pupylated, starting with the most accessible lysine. However, if a target protein has only one accessible lysine, pupylation will occur only once, regardless of the protein’s affinity to the small molecule. In summary, while polypupylation may be a drawback for target-ID, multipupylation could be useful for both target-ID and understanding binding mode. To elaborate on this, we added the following additional explanation after the sentence in line 152: “, whereas multipupylation is more likely correlated with binding affinity or interaction frequency.”

      (16) The author should address whether the Halotag ligand modification of the drug alters the binding properties between the drug and targets. That may be causing artifact binding of the drug and other proteins.

      Thank you for your insightful comment. Yes, it is true that chemical modifications of the small molecule of interest, such as linker derivatization (e.g., HTL) or photo-affinity labeling, generally lead to reduced activity or affinity compared to the original molecule. Synthesizing a derivative is a common challenge across all target-ID methods, except for modification-free approaches, as we mentioned in the Discussion. However, modification-free methods like DARTS, CETSA, and TPP have their own limitations, including low sensitivity or high false positive rates. Identifying the optimal position for chemical modification on the small molecule of interest is critical. We chose dasatinib and HCQ/CQ as model compounds, because previous studies provided insights into their derivative synthesis. In addition, our data show that DH5 retains robust kinase inhibitory activity (Figure 4-figure supplement 2), and DC661-H1 exhibits potent autophagy inhibition (Figure 6-figure supplement 1). For novel compounds, a thorough structure-activity relationship study is essential to identify the optimal position for HTL derivative synthesis.

      (17) The author stated there is no observable toxicity in zebrafish without providing a detailed analysis or enough data. Further analysis of the expression of Halo-PafA and its substrate sPup influence on toxicity or side effects to the living cells or animals would be needed. It is important for in vivo applications.

      Thank you for your constructive suggestion. We have now included additional experimental data in Figure 7-figure supplement 1, showing no toxicity in zebrafish embryos expressing the POST-IT system. We assessed toxicity in two ways: by injecting the POST-IT DNA plasmid into one-cell-stage embryos for acute expression, and by using embryos from transgenic zebrafish expressing POST-IT under a heat-shock inducible promoter. Neither the injection nor the heat-shock activation of POST-IT expression resulted in any noticeable toxicity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important work presents two studies on predictive processes in subjects with and without tinnitus. The evidence supporting the authors' claims is compelling, as their second study serves as an independent replication of the first. Rigorous matching between study groups was performed, especially in the second study, increasing the probability that the identified differences in predictive processing can truly be attributed to the presence of tinnitus. This work will be of interest to researchers, especially neuroscientists, in the tinnitus field.

      We thank the editors at elife very much for their favorable assessment of our manuscript. Based upon the comments of the reviewer, we aimed to further improve our manuscript to be a valuable addition to the tinnitus research field.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study aimed to test experimentally a theoretical framework that aims to explain the perception of tinnitus, i.e., the perception of a phantom sound in the absence of external stimuli, through differences in auditory predictive coding patterns. To this aim, the researchers compared the neural activity preceding and following the perception of a sound using MEG in two different studies. The sounds could be highly predictable or random, depending on the experimental condition. They revealed that individuals with tinnitus and controls had different anticipatory predictions. This finding is a major step in characterizing the top-down mechanisms underlying sound perception in individuals with tinnitus.

      Strengths:

      This article uses an elegant, well-constructed paradigm to assess the neural dynamics underlying auditory prediction. The findings presented in the first experiment were partially replicated in the second experiment, which included 80 participants. This large number of participants for an MEG study ensures very good statistical power and a strong level of evidence. The authors used advanced analysis techniques - Multivariate Pattern Analysis (MVPA) and classifier weights projection - to determine the neural patterns underlying the anticipation and perception of a sound for individuals with or without tinnitus. The authors evidenced different auditory prediction patterns associated with tinnitus. Overall, the conclusions of this paper are well supported, and the limitations of the study are clearly addressed and discussed.

      Weaknesses:

      Even though the authors took care of matching the participants in age and sex, the control could be more precise. Tinnitus is associated with various comorbidities, such as hearing loss, anxiety, depression, or sleep disorders. The authors assessed individuals' hearing thresholds with a pure tone audiogram, but they did not take into account the high frequencies (6 kHz to 16 kHz) in the patient/control matching. Moreover, other hearing dysfunctions, such as speech-in-noise deficits or hyperacusis, could have been taken into account to reinforce their claim that the observed predictive pattern was not linked to hearing deficits. Mental health and sleep disorders could also have been considered more precisely, as they were accounted for only indirectly with the score of the 10-item mini-TQ questionnaire evaluating tinnitus distress. Lastly, testing the links between the individuals' scores in auditory prediction and tinnitus characteristics, such as pitch, loudness, duration, and occurrence (how often it is perceived during the day), would have been highly informative.

      Thank you very much for your careful evaluation of our manuscript. We agree with you that our study design has some limitations such as the assessment of higher frequencies, comorbidities, and tinnitus characteristics. In our discussion, we aimed to acknowledge these issues for future research to improve this study design and gain more insights into neural tinnitus processes.

      See e.g.:

      Line 946-949:

      “Additionally, we rigorously controlled for hearing loss in Study 2, however, pure-tone audiometric testing was solely performed up to 8kHz and we were therefore not able to draw conclusions regarding hearing impairments in higher frequencies and their influence on the effects.”

      Line 949-954:

      “Moreover, we did not screen our participants for hyperacusis. This hypersensitivity to mild sounds is widely correlated with the sensation of tinnitus and underlying neural mechanisms are potentially intertwined with tinnitus processes (Schilling et al., 2023; Yukhnovich et al., 2023; Zheng, 2020). Screening for hyperacusis in future work can therefore reveal more details on participant characteristics influencing predictive processing.”

      Line 955-958:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment.”

      Line 958-963:

      “Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. Next to sleeping disorders and distress, tinnitus is often also accompanied by psychological comorbidities such as depression or anxiety (Langguth, 2011) which are potential confounds of the results.”

      Comments on revisions:

      Thank you for your responses. There are a few remaining points that, if addressed, could further enhance the manuscript:

      - While the manuscript acknowledges the limitation of not matching groups on hearing thresholds in Study 1, a deeper analysis of participants' hearing abilities and their impact on MEG results, similar to that conducted in Study 2, would be valuable. Specifically, including a linear model that considers all frequencies, group membership, and their interactions could highlight differences across groups. Additionally, examining the effect of high-frequency hearing loss on prediction scores, as performed in Study 2, would strengthen the analysis, particularly given the trend noted (line 719). Such an addition could make a significant contribution to the literature by exploring how hearing abilities may influence prediction patterns.

      We appreciate your feedback and agree with you that it is a crucial question how hearing abilities influence prediction patterns in tinnitus. However, as hearing status was not assessed in the control group in study 1, we are unfortunately not able to include linear models to investigate differences across groups in this sample. This led us to the implementation of study 2 with a comprehensive hearing assessment to investigate group differences. We highlighted this issue in our methods section.

      Line 170-172:

      “As pure-tone audiometric testing was not included for the control subjects, group comparisons between hearing thresholds were not feasible.”

      - The connection with the hippocampal regions (line 864) remains somewhat unclear. While the inclusion of the Paquette reference appropriately links temporal region activity with tinnitus, it does not fully support the statement: "An increased focus on hippocampal regions, e.g., in fMRI, patient, or animal studies, could be a worthwhile complement to our MEG work, given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms"

      Thank you for your constructive input. This section is purely speculative, and we do not aim to provide strong claims or expected results but solely point out potential future research directions.

      - Authors should add a comparison of participants mini-TQ scores on both studies

      We appreciate your input and added a comparison of mini TQ-scores between samples. For study 1, all subscales were included, however, we computed the comparison solely based on the items of the mini-TQ to increase comparability. The results were not significant, i.e., tinnitus distress values did not differ between studies.

      Line 629-632:

      “We additionally compared tinnitus distress values assessed by the mini-TQ (Goebel and Hiller, 1992) between study 1 and study 2 to detect potential differences between the samples, however, results of the Welch’s t-test were not significant with t(30.7)=1.27, p\=.214.”

      - Authors should add significant level on Fig 6.B as in Fig 3.C, and a n.s on Fig 6.D

      Thank you very much for your input, we added significance levels and a n.s. to the Figures 6B and 6D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary of the work: In this work, Fruchard et. al. study the enzyme Tgt and how it modifies guanine in tRNAs to queuosine (Q), essential for Vibrio cholerae's growth under aminoglycoside stress. Q's role in codon decoding efficiency and its proteomic effects during antibiotic exposure is examined, revealing Q modification impacts tyrosine codon decoding and influences RsxA translation, affecting the SoxR oxidative stress response. The research proposes Q modification's regulation under environmental cues reprograms the translation of genes with tyrosine codon bias, including DNA repair factors, crucial for bacterial antibiotic response.

      The experiments are well-designed and conducted and the conclusions, for the most part, are well supported by the data. However, a few clarifications will significantly strengthen the manuscript.

      Thank you.

      Major:

      Figure S4 A-D. These growth curves are important data and should be presented in the main figures. Moreover, given that it is not possible to make a rsxA mutant, I wonder if it would be possible to connect rsx and tgt using the following experiment: expression of tgt results in resistance to TOB (in B), while expression of only rsx lower resistance to TOB (in D). Then simultaneous overexpression of both tgt/rsx in the WT strain should have either no effect on TOB resistance or increased resistance, relative to the WT. Perhaps the authors have done this, and if so, the data should be included as it will significantly strengthen their model.

      We thank the reviewer for this suggestion, we have tried to overexpress both tgt and rsxA simultaneously. However, this appears to be toxic as cells form small colonies and cannot grow well in liquid. We think that the presence of 2 plasmids and corresponding selection antibiotics amplify the toxicity of overexpressing rsxA, and even tgt. In fact, it can be seen that tgt overexpression in WT is already slightly deleterious, in the absence of tobramycin (figure 1B).

      Figure S4 - Is there a rationale for why it is possible to make rsx mutants in E. coli, but not in V. cholerae? For example, does E. coli have a second gene/protein that is redundant in function to rsxA, while V. cholerae does not? I think your data hint at this, since in the right panel growth data, your double mutant does not fully rescue back to rsx single mutant levels, suggesting another factor in tgt mutant also acts to lower resistance to TOB. If so, perhaps a line or two in text will be helpful for readers.

      This point raised by the referee is an interesting one that we have also asked ourselves at multiple occasions. In fact, the Rsx operon is linked with oxidative stress and respiration. Vibrio cholerae and E. coli show differences on genes involved in these pathways. V. cholerae lacks the cyo/nuo respiratory complex genes, and does not encode a Suf operon. Moreover, deletion of the anaerobic respiration Frd pathway leads to strong decrease of V. cholerae growth even in aerobic conditions. (10.1128/spectrum.01730-23). We have previously also generally seen differences between the 2 species in response to stress (10.1128/AAC.01549-10) and the way they deal with ROS (10.1371/journal.pgen.1003421). Therefore, we think that the fact that rsx is essential in V. cholerae and not E. coli could either be due to the presence of an additional redundant pathway in E. coli as suggested by the referee, or to more general differences in respiration and treatment of ROS. We thank the referee for highlighting this and we have now included a comment about this in the manuscript.

      - For growth curves in Figure 2 and relative comparisons like in Figure 5D and Figure S4 (and others in the paper), statistics and error bars, along with replicate information should be provided.

      We had mentioned this in the methods section, we have now added the specific information also on figure legends.

      - Figure 6A - Is the transcript fold change in linear or log? If linear, then tgt expression should not be classified as being upregulated in TOB. It is barely up by ~2-fold with TOB- 0.6....which is a mild phenotype, at best.

      We think that 2-fold change of tgt expression can be sufficient to lead to changes in tRNA modification levels. We agree that this is a mild induction, we have thus changed “increase” to “mildly increase” in the results.  

      - Line 779- 780: "This indicates that sub-MIC TOB possibly induces tgt expression through the stringent response activation." To me, the data presented in this figure, do not support this statement. The experiment is indirect.

      We agree, we rephrased: “Tobramycin may induces tgt expression through stringent response activation or through an independent pathway. “

      - Figure 3B and D. - These samples only have tobramycin, correct? The legend says both carbenicillin and tobramycin.

      The legend is correct, samples also have carbenicillin because we are testing here the growth with 2 synonymous beta-lactamase genes in presence of beta-lactams.

      - Figure 5. The color schemes in bars do not match up with the color scheme in cartoons below panels B and C. That makes it confusing to read. Please fix.

      Fixed.

      - A lot of abbreviations have been used. This makes reading a bit cumbersome. Ideally, less abbreviations will be used.

      Fixed

      Reviewer #2 (Public Review):

      Fruchard et al. investigate the role of the queuosine (Q) modification of the tRNA (Q-tRNA) in the human pathogen Vibrio cholerae. First, the authors state that the absence of Q-modified tRNAs (tgt mutant) increases the translation of TAT codons and proteins with a high TAT codon bias. Second, the absence of Q increases rsxA translation, because rsxA gene has a high TAT codon bias. Third, increased RsxA in the absence of Q inhibits SoxR response, reducing resistance towards the antibiotic tobramycin (TOB). Authors also predict in silico which genes harbor a higher TAT bias and found that among them are some involved in DNA repair, experimentally observing that a tgt mutant is more resistant to UV than the wt strain. It is worth noting that authors employ a wide variety of techniques, both experimental and bioinformatic. However, some aspects of the work need to be clarified or reevaluated.

      (1) The statement that the absence of Q increases the translation of TAT codons and proteins encoded by TAT-enriched genes presents the following problems that should be addressed:

      (1.1) The increase in TAT codon translation in the absence of Q is not supported by proteomics, since there was no detected statistical difference for TAT codon usage in proteins differentially expressed. Furthermore, there are some problems regarding the statistics of proteomics. Some proteins shown in Table S1 have adjusted p-values higher than their pvalues, which makes no sense. Maybe there is a mistake in the adjusted p-value calculation.

      We appreciate the reviewer’s thorough examination of our findings. In our study, we employed an adaptive Benjamini-Hochberg (BH) procedure to control the false discovery rate in our list of selected proteins, as explained in the Data Analysis part of the Proteomics MS and analysis part of our material and methods. The classical BH procedure (10.1111/j.2517-6161.1995.tb02031.x) calculates the 𝑚×𝑝(𝑗) adjusted p-value for the i-th ranked p-value as min where 𝑝(𝑗) is the j-th ranked pvalue and 𝑚 is the number of tests (e.g. number of proteins) (see 10.1021/acs.jproteome.7b00170 for details). Since m/j > 1 and 𝑝(𝑗) > 𝑝(𝑖) for 𝑗≥𝑚, it follows that for 𝑗≥i, resulting in adjusted p-values being higher or equal than the original p-values. Therefore, contrary to the reviewer's comment, it is a mathematical property that the adjusted p-value is greater than the original p-value when using the classical Benjamini-Hochberg procedure. 

      However, we want to underline that we used an « adaptive » BH procedure, which calculates the adjusted p-value for the i-th ranked p-value as min , where 𝜋0 is an estimate of the proportion of true null hypotheses (see 10.1021/acs.jproteome.7b00170 for details). Indeed, the classical BH procedure makes the assumption that 𝜋0 \= 1, which is a strong assumption in MS-based proteomics context.  Consequently, the mathematical property that the adjusted p-value is greater than the original p-value does not always hold true in our approach (that depends also on the 𝜋0 parameter).

      In addition, it is not common to assume that proteins that are quantitatively present in one condition and absent in another are differentially abundant proteins. Proteomics data software typically addresses this issue and applies some corrections. It would be advisable to review that.

      We thank the reviewer for highlighting this point. Indeed, some software impute a random small value to replace missing values and then produces statistics based on this imputed data (10.1038/nmeth.3901). However, the validity and relevance of generating statistics in the absence of actual data is questionable. 

      There are no universally accepted guidelines for handling this situation, and we believe it is more logical to set these values aside as potential interesting proteins. It is well-established that intensity values are often missing due to the detection limits of the spectrometer, suggesting that the missing values observed in several replicates of a condition are actually due to low values (see 10.1093/bioinformatics/btp362 and 10.1093/bioinformatics/bts193 for instance). It is thus logical to consider the associated proteins as potentially differentially abundant when comparing their complete absence in all replicates of one condition to their presence in several replicates of another condition.

      (1.2) Problems with the interpretation of Ribo-seq data (Figure 4D). On the one hand, the Ribo-seq data should be corrected (normalized) with the RNA-seq data in each of the conditions to obtain ribosome profiling data, since some genes could have more transcription in some of the conditions studied. In other articles in which this technique is used (such as in Tuorto et al., EMBO J. 2018; doi: 10.15252/embj.201899777), it is interpreted that those positions in which the ribosome moves most slowly and therefore less efficiently translated), are the most abundant. Assuming this interpretation, according to the hypothesis proposed in this work, the fragments enriched in TAT codons should have been less abundant in the absence of Q-tRNA (tgt mutant) in the Rib-seq experiment. However, what is observed is that TAT-enriched fragments are more abundant in the tgt mutant, and yet the Ribo-seq results are interpreted as RNA-seq, stating that this is because the genes corresponding to those sequences have greater expression in the absence of Q. 

      As recommended by the reviewer, we normalized the RiboSeq data with the RNAseq data to account for potential RNA variations. The updated Figure 4 demonstrates that this normalization does not alter our findings, confirming that variations at the RNAseq level do not contradict changes at the translational level. 

      The reviewer's observation that pauses at TAT codons would lead to ribosome accumulation and subsequent categorization as "up" genes is accurate. We must emphasize, however, that this category of “up genes” is probably quite diverse. The effect of ribosome stalling at TAT codons on total mRNA ribosome occupancy is likely highly variable, depending on the location of the TAT codon(s) within the CDS and the gene's expression level. We therefore think that genes in the "Up" category mainly correspond to genes that are more translated because the impact of pausing at TAT codons is probably not strong enough. Note that unlike what is usually done in bacterial riboseq experiments, we did not use any antibiotics to artificially freeze the ribosomes.

      On the other hand, it would be interesting to calculate the mean of the protein levels encoded by the transcripts with high and low ribosome profiling data.

      While this is a common request, we believe that comparing RiboSeq and proteomics data is not particularly informative. RiboSeq data directly measures translation, while proteomics provides information about protein abundance at steady state, reflecting the balance between protein synthesis and degradation. Furthermore, the number of proteins detectable by mass spectrometry is significantly smaller than the number of genes quantified by RiboSeq. Given these factors, there is often a low correlation between translation and protein abundance, making a direct comparison less relevant 

      (1.3) This statement is contrary to most previously reported studies on this topic in eukaryotes and bacteria, in which ribosome profiling experiments, among others, indicate that translation of TAT codons is slower (or unaffected) than translation of the TAC codons, and the same phenomenon is observed for the rest of the NAC/T codons. This is completely opposed to the results showed in Figure 4. However, the results of these studies are either not mentioned or not discussed in this work. Some examples of articles that should be discussed in this work:

      - "Queuosine-modified tRNAs confer nutritional control of protein translation" (Tuorto et al., 2018; 10.15252/embj.201899777)

      - "Preferential import of queuosine-modified tRNAs into Trypanosoma brucei mitochondrion is critical for organellar protein synthesis" (Kulkarni et al., 2021; doi:10.1093/nar/gkab567.

      - "Queuosine-tRNA promotes sex-dependent learning and memory formation by maintaining codonbiased translation elongation speed" (Cirzi et al., 2023; 10.15252/embj.2022112507)

      - "Glycosylated queuosines in tRNAs optimize translational rate and post-embryonic growth" (Zhao et al., 2023; 10.1016/j.cell.2023.10.026)

      - "tRNA queuosine modification is involved in biofilm formation and virulence in bacteria" (Diaz-Rullo and Gonzalez-Pastor, 2023; doi: 10.1093/nar/gkad667). In this work, the authors indicate that QtRNA increases NAT codon translation in most bacterial species. Could the regulation of TAT codonenriched proteins by Q-tRNAs in V. cholerae an exception? In addition, authors use a bioinformatic method to identify genes enriched in NAT codons similar to the one used in this work, and to find in which biological process are involved the genes whose expression is affected by Q-tRNAs (as discussed for the phenotype of UV resistance). It will be worth discussing all of this.

      Thank you for detailed suggestions, we agree that this discussion was missing and this comment gives us a chance to address that in the revised version of the manuscript.

      About the references above suggested by the referee, 4 of these papers were not mentioned in our manuscript, these were published while our manuscript was previously in review and we realize we have not cited them in the latest version of our manuscript. We thank the referee for highlighting this. We have now included a discussion about this. 

      We included the following in the discussion:

      “However, the opposite codon preference was shown in E. coli {Diaz-Rullo, 2023 #1888}. In eukaryotes also, several recent studies indicate slower translation of U-ending codons in the absence of Q34 {Cirzi, 2023 #1887;Kulkarni, 2021 #1886;Tuorto, 2018 #1268}. It’s important to note here, that in V. cholerae ∆tgt, increased decoding of U-ending codons is observed only with tyrosine, and not with the other three NAC/U codons (Histidine, Aspartate, Asparagine). This is interesting because it suggests that what we observe with tyrosine may not adhere to a general rule about the decoding efficiency of U- or C-ending codons, but instead seems to be specific to Tyr tRNAs, at least in the context of V. cholerae. Exceptions may also exist in other organisms. For example, in human cells, queuosine increases efficiency of decoding for U- ending codons and slows decoding of C- ending codons except for AAC {Zhao, 2023 #1889}. In this case, the exception is for tRNA Asparagine. Moreover, in mammalian cells {Tuorto, 2018 #1268}, ribosome pausing at U-ending codons is strongly seen for Asp, His and Asn, but less with Tyr. In Trypanosoma {Kulkarni, 2021 #1886}, reporters with a combination of the 4 NAC/NAU codons for Asp, Asn, Tyr, His have been tested, showing slow translation at U- ending version of the reporter in the absence of Q, but the effect on individual codons (e.g. Tyr only) is not tested. In mice {Cirzi, 2023 #1887}, ribosome slowdown is seen for the Asn, Asp, His U-ending codons but not for the Tyr U-ending codon. In summary, Q generally increases efficiency of U- ending codons in multiple organisms, but there appears to be additional unknown parameters which affect tyrosine UAU decoding, at least in V. cholerae. Additional factors such as mRNA secondary structures or mistranslation may also contribute to the better translation of UAU versions of tested genes. Mistranslation could be an important factor. If codon decoding fidelity impacts decoding speed, then mistranslation could also contribute to decoding efficiency of Tyr UAU/UAC codons and proteome composition.”

      (1.4) It is proposed that the stress produced by the TOB antibiotic causes greater translation of genes enriched in TAT codons. 

      Actually, it’s the opposite because in presence of TOB, in the wt, tgt would be induced leading to more Q on tRNA-Tyr and less translation of TAT.

      On the one hand, it is shown that the GFP-TAT version (gene enriched in TAT codons) and the RsxATAT-GFP protein (native gene naturally enriched in TAT) are expressed more, compared to their versions enriched in TAC in a tgt mutant than in a wt, in the presence of TBO (Fig. 5C). 

      Figure 5C shows relative fluorescence, ie changes of fluorescence in delta-tgt compared to WT. So it’s not necessarily more expressed but “more increased”

      However, in the absence of TOB, and in a wt context, although the two versions of GFP have a similar expression level (Fig. 3SD), the same does not occur with RsxA, whose RsxA-TAT form (the native one) is expressed significantly more than the RsxA-TAC version (Fig. 3SA). How can it be explained that in a wt context, in which there are also tRNA Q-modification, a gene naturally enriched in TAT is translated better than the same gene enriched in TAC?

      We thank the referee for this question based on careful assessment of our data. We agree, there appears to be significantly more RsxA-TAT in WT than RsxA-TAC. This could be due to other effects such as secondary structure formation on mRNA when the wt RsxA is recoded with TAC codons. This does not hinder the conclusion that the translation of the TAT version is increased in delta-tgt compared to WT.  

      It would be expected that in the presence of Q-tRNAs the two versions would be translated equally (as happens with GFP) or even the TAT version would be less translated. On the other hand, in the presence of TOB the fluorescence of WT GFP(TAT) is higher than the fluorescence of WT GFP(TAC) (Figure S3E) (mean fluorescence data for RsxA-GFP version in the presence of TOB is not shown). These results may indicate that the apparent better translation of TAT versions could be due to indirect effects rather from TAT codon translation.

      This is now mentioned in the manuscript

      “We cannot exclude, however, that additional factors such as mRNA secondary structures also contributes to the better translation of UAU versions of tested genes. “

      (2) Another problem is related to the already known role of Q in prevention of stop codon readthrough, which is not discuss at all in the work. In the absence of Q, stop codon readthrough is increased. In addition, it is known that aminoglycosides (such as tobramycin) also increase stop codon readthrough ("Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides"; Wanger and Green, 2023; 10.7554/eLife.52611). Absence of Q and presence of aminoglycosides can be synergic, producing devastating increases in stop codon readthrough and a large alteration of global gene expression. All of these needs to be discussed in the work. Moreover, it is known that stop codon readthrough can alter gene expression and mRNA sequence context all influence the likelihood of stop codon readthrough. Thus, this process could also affect to the expression of recoded GFP and RsxA versions.

      We included the following in the revised version of the manuscript (results):

      “Q modification impacts decoding fidelity in V. cholerae.

      To test whether a defect in Q34 modification influences the fidelity of translation in the presence and absence of tobramycin, previously developed reporter tools were used (Fabret & Namy, 2021), to measure stop codons readthrough in V. cholerae ∆tgt and wild-type strains. The system consists of vectors containing readthrough promoting signals inserted between the lacZ and luc sequences, encoding β-galactosidase and luciferase, respectively. Luciferase activity reflects the readthrough efficiency, while β-galactosidase activity serves as an internal control of expression level, integrating a number of possible sources of variability (plasmid copy number, transcriptional activity, mRNA stability, and translation rate).  We found increased readthrough at stop codons UAA and to a lesser extent at UAG for ∆tgt, and this increase was amplified for UAG in presence of tobramycin (Fig. S2, stop readthrough). In the case of UAA, tobramycin appears to decrease readthrough, this may be artefactual, due to the toxic effect of tobramycin on ∆tgt.

      Mistranslation at specific codons can also impact protein synthesis. To further investigate mistranslation levels by tRNATyr in WT and ∆tgt, we designed a set of gfp mutants where the codon for the catalytic tyrosine required for fluorescence (TAT at position 66) was substituted by nearcognate codons (Fig. S2). Results suggest that in this sequence context, particularly in the presence of tobramycin, non-modified tRNATyr mistakenly decodes Asp GAC, His CAC and also Ser UCC, Ala GCU, Gly GGU, Leu CUU and Val GUC codons, suggesting that Q34 increases the fidelity of tRNATyr. 

      In parallel, we replaced Tyr103 of the β-lactamase described above, with Asp codons GAT or GAC. The expression of the resulting mutant β-lactamase is expected to yield a carbenicillin sensitive phenotype. In this system, increased tyrosine misincorporation (more mistakes) by tRNATyr at the mutated Asp codon, will lead to increased synthesis of active β-lactamase, which can be evaluated by carbenicillin tolerance tests. As such, amino-acid misincorporation leads here to phenotypic (transient) tolerance, while genetic reversion mutations result in resistance (growth on carbenicillin). The rationale is summarized in Fig. 3C. When the Tyr103 codon was replaced with either Asp codons, we observe increased β-lactamase tolerance (Fig. 3D, left), suggesting increased misincorporation of tyrosine by tRNATyr at Asp codons in the absence of Q, again suggesting that Q34 prevents misdecoding of Asp codons by tRNATyr.

      In order to test any effect on an additional tRNA modified by Tgt, namely tRNAAsp, we mutated the Asp129 (GAT) codon of the β-lactamase. When Asp129 was mutated to Tyr TAT (Fig. 3D, right), we observe reduced tolerance in ∆tgt, but not when it was mutated to Tyr TAC, suggesting less misincorporation of aspartate by tRNAAsp at the Tyr UAU codon in the absence of Q. In summary, absence of Q34 increases misdecoding by tRNATyr at Asp codons, but decreases misdecoding by tRNAAsp at Tyr UAU. 

      This supports the fact that tRNA Q34 modification is involved in translation fidelity during antibiotic stress, and that the effects can be different on different tRNAs, e.g. tRNATyr and tRNAAsp tested here.”

      Added figures: Figure S2, Figure 3CD

      (3) The statement about that the TOB resistance depends on RsxA translation, which is related to the presence of Q, also presents some problems:

      (3.1) It is observed that the absence of tgt produces a growth defect in V. cholerae when exposed to TOB (Figure 1A), and it is stated that this is mediated by an increase in the translation of RsxA, because its gene is TAT enriched. However, in Figure S4F, it is shown that the same phenotype is observed in E. coli, but its rsxA gene is not enriched in TAT codons. Therefore, the growth defect observed in the tgt mutant in the presence of TOB may not be due to the increase in the translation of TAT codons of the rsxA gene in the absence of Q. This phenotype is very interesting, but it may be related to another molecular process regulated by Q. Maybe the role of Q in preventing stop codon readthrough is important in this process, reducing cellular stress in the presence of TOB and growing better.

      FigS4F (now figure 5D) shows that rsxA can be toxic during growth in presence of tobramycin, but it does not show that rsxA translation is increased in E. coli in delta-tgt. However, we agree with the referee that there are probably additional processes regulated by Q which are also involved in the response to TOB stress. We already had mentioned this briefly in the discussion (“Note that, our results do not exclude the involvement of additional Q-regulated MoTTs in the response to sub-MIC TOB, since Q modification leads to reprogramming of the whole proteome. “), we further discussed it as follows:

      “As a consequence, transcripts with tyrosine codon usage bias are differentially translated. One such transcript codes for RsxA, an anti-SoxR factor. SoxR controls a regulon involved in oxidative stress response and sub-MIC aminoglycosides trigger oxidative stress in V. cholerae{Baharoglu, 2013 #720}, pointing to an involvement of oxidative stress response in the response to sub-MIC tobramycin stress.

      A link between Q34 and oxidative stress has also been previously found in eukaryotic organisms {Nagaraja, 2021 #1466}. Note that our results do not exclude the involvement of additional Qregulated translation of other transcripts in the response to tobramycin. Q34 modification leads to reprogramming of the whole proteome, not only for other transcripts with codon usage bias, but also through an impact on the levels of stop codon readthrough and mistranslation at specific codons, as supported by our data.”

      (3.2) All experiments related to the effect of Q on the translation of TAT codons have been performed with the tgt mutant strain. Considering that the authors have a pSEVA-tgt plasmid to overexpress this gene, they would have to show whether tgt overexpression in a wt strain produces a decrease in the translation of proteins encoded by TAT-enriched genes such as RsxA. This experiment would allow them to conclude that Q reduces RsxA levels, increasing resistance to TOB.

      We agree that this would be interesting to test, however, as it can be seen in figure 1B, delta-tgt pSEVAtgt (complemented strain) grows better than WT pSEVA-tgt (tgt overexpression). In fact, overexpression of tgt negatively impacts cell growth and yield smaller colonies, especially when cells carry a second plasmid (e.g with gfp constructs). We have also seen this with other RNA modification gene overexpressions in the lab (unpublished). We believe that the expression of tgt is tuned and since overexpression affects fitness, it is generally difficult to conduct experiments with overexpression plasmid for RNA modifications.  Nevertheless, we have done the experiment (with slow growing bacteria) and when we normalize expression of gfp in the presence of tgt overexpressing plasmid to the condition with no plasmid, we see little (1.5 fold) or no effect of tgt overexpression on fluorescence (see graph below). This is probably due to a toxic effect of ooverexpression and we do not believe these results are biologically relevant. 

      Author response image 1.

      (3.3) On the other hand, Fig. 1B shows that when the wt and tgt strains compete, both overexpressing tgt, the tgt mutant strain grows better in the presence of TOB. This result is not very well understood, since according to the hypothesis proposed, the absence of modification by Q of the tRNA would increase the translation of genes enriched in TAT, therefore, a strain with a higher proportion of Q-modified tRNAs as in the case of the wt strain overexpressing tgt would express the rsxA gene less than the tgt strain overexpressing tgt and would therefore grow better in the presence of TOB. For all these reasons, it would be necessary to evaluate the effect of tgt overexpression on the translation of RsxA.

      See our answer above about negative effect of tgt overexpression.

      (3.4) According to Figure 1I, the overexpression of tRNA-Tyr(GUA) caused a better growth of tgt mutant in comparison to WT. If the growth defect observed in tgt mutant in the presence of TOB is due to a better translation of the TAT codons of rsxA gene, the overexpression of tRNA-Tyr(GUA) in the tgt mutant should have resulted in even better RsxA translation a worse growth, but not the opposite result.

      We agree, we think that rsxA is not the only factor responsible for growth defect of tgt in presence of TOB (as now further discussed in the discussion). Overexpression of tRNAtyr possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched genes. As also suggested by rev3, we have measured decoding reporters for TAT/TAC while overexpressing tTNA-tyr. This is now added to the results in fig S2C and the following:

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNATyr in trans (Fig. S1C). The presence of the plasmid (empty p0) amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt compared to WT. Overexpression of tRNATyrGUA did not significantly impact decoding of TAT and increased decoding of TAC, as expected. Since overexpression of tRNATyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNATyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Unexpectedly, overexpression of tRNATyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNATyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts.” 

      Added figure: figure S1C

      (4) It cannot be stated that DNA repair is more efficient in the tgt mutant of V. cholerae, as indicated in the text of the article and in Fig 7. The authors only observe that the tgt mutant is more resistant to UV radiation and it is suggested that the reason may be TAT bias of DNA repair genes. To validate the hypothesis that UV resistance is increased because DNA repair genes are TAT biased, it would be necessary to check if DNA repair is affected by Q. UV not only produces DNA damage, but also oxidative stress. Therefore, maybe this phenotype is due to the increase in proteins related to oxidative stress controlled by RsxA, such as the superoxide dismutase encoded by sodA. It is also stated that these repair genes were found up for the tgt mutant in the Ribo-seq data, with unchanged transcription levels. Again, it is necessary to clarify this interpretation of the Ribo-seq data, since the fact that they are more represented in a tgt mutant perhaps means that translation is slower in those transcripts. Has it been observed in proteomics (wt vs tgt in the absence of TOB) whether these proteins involved in repair are more expressed in a tgt mutant?

      We agree that our results do not directly show that DNA repair is more efficient, but that delta-tgt responds better to UV. This has been modified in the manuscript. About oxidative stress, we did not see a better or worse response to H202 of delta-tgt. Moreover, since we see better response of deltatgt  to UV only in V. cholerae and not in E. coli, we did not favor the hypothesesi of response to stressox. In proteomics, we do not detect changes for DNA repair genes except for RuvA which is more abundant in delta-tgt. We have toned down the statement about DNA repair in the paper.

      (5) The authors demonstrate that in E. coli the tgt mutant does not show greater resistance to UV radiation (Fig. 7D), unlike what happens in V. cholerae. It should be discussed that in previous works it has been observed that overexpression in E. coli of the tgt gene or the queF gene (Q biosynthesis) is involved in greater resistance to UV radiation (Morgante et al., Environ Microbiol, 2015 doi: 10.1111/1462-2920.12505; and Díaz-Rullo et al., Front Microbiol. 2021 doi: 10.3389/fmicb.2021.723874). As an explanation, it was proposed (Diaz-Rullo and Gonzalez-Pastor, NAR 2023 doi: 10.1093/nar/gkad667) that the observed increase in the capacity to form biofilms in strains that overexpress genes related to Q modification of tRNA would be related to this greater resistance to UV radiation.

      We now mention the previous observations suggesting a link between tgt and UV. We thank the referee for the reference which we had overlooked. Note that in the case of our experiments, all cultures are in planktonic form and are not allowed to form biofilms. We thus prefer not to biofilmlinked processes in this study.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript the authors begin with the interesting phenotype of sub-inhibitory concentrations of the aminoglycoside tobramycin proving toxic to a knockout of the tRNA-guanine transglycosylase (Tgt) of the important human pathogen, Vibrio cholerae. Tgt is important for incorporating queuosine (Q) in place of guanosine at the wobble position of GUN codons. The authors go on to define a mechanism of action where environmental stressors control expression of tgt to control translational decoding of particularly tyrosine codons, skewing the balance from TAC towards TAT decoding in the absence of the enzyme. The authors use advanced proteomics and ribosome profiling to reveal that the loss of tgt results in increased translation of proteins like RsxA and a cohort of DNA repair factors, whose genes harbor an excess of TAT codons in many cases. These findings are bolstered by a series of molecular reporters, mass spectrometry, and tRNA overexpression strains to provide support for a model where Tgt serves as a molecular pivot point to reprogram translational output in response to stress.

      Strengths:

      The manuscript has many strengths. The authors use a variety of strains, assays, and advanced techniques to discover a mechanism of action for Tgt in mediating tolerance to sub-inhibitory concentrations of tobramycin. They observe a clear phenotype for a tRNA modification in facilitating reprogramming of the translational response, and the manuscript certainly has value in defining how microbes tolerate antibiotics.

      We thank the referee for their time and comments. 

      Weaknesses:

      The conclusions of the manuscript are mostly very well-supported by the data, but in some places control experiments or peripheral findings cloud precise conclusions. Some additional clarification, discussion, or even experimental extension could be useful in strengthening these areas.

      (1) The authors have created and used a variety of relevant molecular tools. In some cases, using these tools in additional assays as controls would be helpful. For example, testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain, Figure 5C with the rxsA-GFP fusion, and/or Figure 7B with UV stress would provide additional information of the ability of tRNA overexpression to compensate for the defect in these situations.

      Thank you for the suggestions. Since overexpression of tRNA tyr is not expected to decrease decoding of TAT, we do not necessarily expect any effect for UV and rsxA expression. Overexpression of tRNA_GUA restores fitness of delta-tgt in TOB, but this is probably independent of RsxA. As ref2 also suggested above, we included in the discussion that the effect seen in delta-tgt with TOB is not only due to RsxA expression but also additional processes. However, these suggestions are interesting and we performed the following experiments in order to have an answer for these questions: 

      - “testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain”: 

      This is now included in figure S2C and results as follows: 

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNA-Tyr in trans (Fig. S1C). The presence of the plasmid amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt with empty plasmid compared to WT. Overexpression of tRNA_TyrGUA did not significantly impact decoding of TAT and increased decoding of TAC as expected. Since overexpression of tRNA_TyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNA_TyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Interestingly, overexpression of TyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNA_TyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts. “  

      -  Figure 5C with the rxsA-GFP fusion: 

      When we overexpress tRNA_GUA, rsxA fluorescence is 2-fold higher in delta-tgt compared to wt. However, the fluorescence is highly decreased compared to the condition with no tRNA overexpression. While we are not sure whether this apparent decrease is a technical issue or not (e.g. due to the presence of additional plasmid), we prefer not to further explore this in this manuscript. Note that we could not obtain delta-tgt strain carrying both plasmids expressing tRNA_GUA and rsxA, suggesting toxic overproduction of rsxA in this context.

      Author response image 2.

      - Figure 7B with UV stress: 

      Here again, delta-tgt overexpressing tRNA_GUA is still more UV resistant than WT overexpressing tRNA_GUA.

      Author response image 3.

      (2) The authors present a clear story with a reprogramming towards TAT codons in the knockout strain, particularly regarding tobramycin treatment. The control experiments often hint at other codons also contributing to the observed phenotypes (e.g., His or Asp), yet these effects are mostly ignored in the discussion. It would be helpful to discuss these findings at a minimum in the discussion section, or possibly experimentally address the role of His or Asp by overexpression of these tRNAs together with Tyrosine tRNA(GUA) in an experiment like that of Figure 1I to see if a more "wild type" phenotype would present. In fact, the synergy of Tyr, His, and/or Asp codons likely helps to explain the effects observed with the DNA repair genes in later experiments.

      We thank the referee for the suggestion. We agree that there could be synergies between these codons, and that’s probably why proteomics data does not clearly reflect tyrosine codons usage bias. This is now further discussed in the ideas and speculation section. 

      Moreover, we have added Figure S3G and the following result:

      “Since not all TAT biased proteins are found to be enriched in ∆tgt proteomics data, the sequence context surrounding TAT codons could affect their decoding. To illustrate this, we inserted after the gfp start codon, various tyrosine containing sequences displayed by rsxA (Fig. S3G). The native tyrosines were all TAT codons, our synthetic constructs were either TAT or TAC, while keeping the remaining sequence unchanged.  We observe that the production of GFP carrying the TEYTATLLL sequence from RsxA is increased in Δtgt compared to WT, while it is unchanged with TEYTACLLL. However, production of the GFP with the sequences LYTATRLL/LYTACRLL and EYTATLR/ EYTACLR was not unaffected (or even decreased for the latter) by the absence of tgt. Overall, our results demonstrate that RsxA is upregulated in the ∆tgt strain at the translational level, and that proteins with a codon usage bias towards tyrosine TAT are prone to be more efficiently translated in the absence of Q modification, but this is also dependent on the sequence context. “

      (3) Regarding Figure 6D, the APB northern blot feels like an afterthought. It was loaded with different amounts of RNA as input and some samples are repeated three times, but Δcrp only once. Collectively, it makes this experiment very difficult to assess.

      A different amount of RNA was used only for ∆tgt in which we have only one band because of the absence of modification. For all the other conditions, the same amount of RNA was used (0.9 µg). Additional replicates of crp were in an additional gel but only a representative gel was shown in the manuscript. This is now specified in the legend.

      We also attach below the picture of the gel with total RNA (syber Gold labelling of total RNA), where it can be seen that the lanes contain an equivalent quantity of RNA, except for ∆tgt.

      Author response image 4.

      Minor Points:

      (3) Fig S2B, do the authors have a hypothesis why the Asp and Phe tRNAs lead to a growth decrease in the untreated samples? It appears like Phe(GAA) partially compensates for the defect.

      Yes we agree, at this stage we do not have any satisfactory answer for this unfortunately. This would be interesting to study further but this is beyond the scope of the present study.

      (5) Lines 655 to 660 seem more appropriate as speculation in the discussion rather than as a conclusion in the results, where no direct experiments are performed. The authors might take advantage of the "Ideas and Speculation" section that eLife allows.

      Thank you very much for this suggestion, we added this section to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor.

      - Figure 6 - Fonts on several mutants is different size/type. fixed

      - What is the Pm promoter. Please expand and give enough details so reader can follow. Especially as it is less used in V. cholerae (typical being pBAD or pTAC promoters). done

      - Spacing where references are inserted should be checked. done

      - Line 860-863 - "V. cholerae's response to sub-MIC antibiotic stress is transposable to other Gramnegative pathogens" . This reads awkard. Consider rephrasing. done

      - Figure 7 - Text in A and C is very small and is very hard to read. Font for tgt is different.

      Fixed. Tgt is in italics.

      Reviewer #2 (Recommendations For The Authors):

      As specified in the public review, more evidence would be necessary to affirm that tRNAs not modified by Q have a greater preference for translating TAT codons, since there are several previous studies in which it is shown that Q-tRNAs have a greater preference for NAT codons (including TAT). For example, it is suggested to explore what happens with other recoded genes (enriched in TAT or TAC) if there is a high level of Q-tRNAs (overexpression of tgt in a wt context). It is also necessary to clarify how to interpret the Ribo-seq results, which apparently is different from how they have been interpreted in other studies.

      Please see above our responses and changes made to the manuscript.

      Minor corrections

      In Figure 8, replace "Epitranscriptomic adapation to stress" with "Epitranscriptomic adaptation to stress".

      Fixed, thank you for noticing!

      Reviewer #3 (Recommendations For The Authors):

      (1) Lines 48-50, and 110 to 112, the authors have a nice mechanism and story, yet the lines mentioned feel very qualified (e.g., "possibly", "plausibly") and lead to the abstract hiding the value and major conclusions of the study. The authors could consider to revise or even remove these lines to focus on the take-home message in the abstract and end of introduction/discussion. 

      Thank you for this comment, we modified the text.  

      (2) Additional description for the samples in the results section for Figure 1 would be helpful to the reader.

      Done

      (3) Figure S1, the line of experiments with rluF is interesting, but in the end the choice seems a little random. Have the authors assessed knockouts of other modifications on the ASL for effects? Since the modification is not well characterized in V. cholerae according to the authors, it might make sense to save this for a future paper.

      We removed S1, as we agree that this experiment does not really add something to the paper.

      (4) Line 334 and 353 are redundant.

      Fixed

      (5) It is likely beyond the scope of the study, but it would strengthen the paper to repeat Figure 3 with His and/or Asp based on the findings of 2C and 4E to better understand the contribution of His and Asp to Q biology.

      We repeated figure 3 with Asp. Based on Fig 2C (less efficient decoding of GAC in deta-tgt in TOB) and 4E (positive GAT codon bias in proteins up in riboseq in delta-tgt TOB), we would expect that beta-lactamase with asp GAC would be less efficiently decoded than GAT in delta-tgt. 

      This was added to the manuscript

      “Like Tyr103, Asp129 was shown to be important for resistance to β-lactams (Doucet et al., 2004; Escobar et al., 1994; Jacob et al., 1990). When we replaced the native Asp129 GAT with the synonymous codon Asp129 GAC, the GAC version did not appear to produce functional β-lactamase in ∆tgt (Fig. 3B), suggesting increased mistranslation or inefficient decoding of the GAC codon by tRNAAsp in the absence of Q. Decoding of GAT codon was also affected in ∆tgt in the presence of tobramycin.”

      Added figure: Figure 3B

      (6) The authors could consider replacing 5D with S4A-D, which is easier to understand in our opinion.

      Done

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article identifies ADGR3 as a candidate GPCR for mediating beige fat development. The authors use human expression data from Human Protein Atlas and Gtex databases and combine this with experiments performed in mice and a murine cell line. They refer to a GPCR bioactivity screening tool PRESTO-Salsa, with which it was found that Hesperetin activates ADGR3. From their experiments, authors conclude that Hesperetin activates ADGR3, inducing a Gs-PKA-CREB axis resulting in adipose thermogenesis.

      Strengths:

      The authors analyze human data from public databases and perform functional studies in mouse models. They identify a new GPCR with a role in thermogenic activation of adipocytes.

      Considerations:

      Selection of ADGRA3 as a candidate GPCR relevant for mediating beiging in humans:

      The authors identify GPCRs that are expressed more highly in murine iBAT compared to iWAT in response to cold and assess which of these GPCRs are expressed in human subcutaneous or visceral adipocytes. Although this strategy will identify GPCRs that are expressed at higher levels in brown fat compared to beige and thus possibly more active in thermogenic function, the relevance in choosing GPCRs that also are expressed in unstimulated human white adipocytes should be considered. Thermogenic activity is not normally present in human white adipocytes. It would have strengthened the GPCR selection if the authors instead had assessed the intersection with human brown adipocytes that were activated with norepinephrine.

      We appreciate your constructive feedback and believe that by adopting this refined strategy, we will strengthen our selection of GPCRs related to adipose thermogenesis in other ongoing studies. We look forward to continuing our research in this area and contributing to the understanding of adipose thermogenesis and its potential therapeutic applications. Thank you once again for your valuable input. 

      Strategy to investigate the role of ADGRA3 in WAT beiging:

      Having identified ADGRA3 as their candidate receptor, the authors investigated the receptor in mouse models, the murine inguinal adipocyte cell line 3T3 and in human subcutaneous adipose progenitors (HAdsc) differentiated in vitro. Calling the human cells "beige" is a stretch as these cells are derived from a white adipose depot. The authors do observe regulation in UCP1 and abundance of mitochondria following modification of ADGRA3 in the cells. However, in future studies, it should be considered if the receptor rather plays a role in differentiation per se, and perhaps not specifically in thermogenic differentiation/activity.

      Regarding the reviewer's suggestion to consider whether ADGRA3 plays a role in differentiation per se, rather than specifically in thermogenic differentiation/activity, we acknowledge that this is an important consideration. Our current studies have focused on the role of ADGRA3 in regulating UCP1 expression and mitochondrial abundance, which are hallmarks of adipose thermogenic activity. However, we recognize that ADGRA3 may also have broader roles in adipocyte differentiation and function that are not limited to thermogenesis.

      To address this point, in future studies, we plan to conduct additional experiments to investigate the potential role of ADGRA3 in adipocyte differentiation, including its effects on the expression of markers of adipocyte differentiation and its impact on adipocyte metabolism and function. These studies will provide further insights into the mechanisms by which ADGRA3 regulates adipocyte biology.

      According to the Human Protein Atlas and Gtex databases, ADGRA3 is not only expressed in adipocytes, but also in other tissues and cell types. The authors address this by measuring the expression in a panel of these tissues, demonstrating a knockdown not only in the adipose tissue, but also in the liver and less pronounced in the muscle (Figure S2). It should thus be emphasized that the decreased TG levels in serum and liver in the mice might in fact depend on Adgra3 overexpression in the liver. Even though this might not have been the purpose of the experiment, it is important to highlight this as it could serve as hypothesis building for future studies of the function of this receptor.

      Thank you for your thoughtful comments and feedback. We appreciate the insight provided by the Human Protein Atlas and Gtex databases regarding the tissue distribution of ADGRA3. We fully acknowledge that the decreased TG levels observed in both the serum and liver of the mice might be linked to the overexpression of Adgra3 in the liver.

      Although this was not the primary objective of our experiment, we agree that this observation is worth highlighting as it could serve as a basis for future hypothesis-driven research on the functional role of ADGRA3 in different tissues. In light of your comments, we emphasized this potential link between Adgra3 overexpression in the liver and reduced TG levels in discussion, as follows.

      “…the precise mechanisms underlying the influence of on adipose thermogenesis. Furthermore, it is crucial to highlight that the observed decrease in TG levels in both serum and liver (Figure 4-figure supplement 2C-D) might be attributed to the significant increase in Adgra3 expression in the liver, which is a consequence of the nanoparticle-mediated overexpression of Adgra3. While the exact mechanism remains to be fully elucidated, this correlation suggests a potential link between Adgra3 overexpression in the liver and reduced TG levels in the serum. We will employ more sophisticated models in subsequent studies to further…”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Zhao et al. explored the function of adhesion G protein-coupled receptor A3 (ADGRA3) in thermogenic fat biology.

      Strengths:

      Through both in vivo and in vitro studies, the authors found that the gain function of ADGRA3 leads to browning of white fat and ameliorates insulin resistance.

      Weaknesses:

      There are several lines of weak methodologies such as using 3T3-L1 adipocytes and intraperitoneal(i.p.) injection of virus. Moreover, as the authors stated that ADGRA3 is constitutively active, how could the authors then identify a chemical ligand?

      Comments on revised version:

      The revised manuscript by Zhao et al. has limited improvement. The authors refused to perform revised experiments using primary cultures even though two reviewers pointed out the same weakness (3T3-L1 adipocytes are unsuitable). Using infrared thermography to measure body temperature is also problematic.

      Thanks for your comments. We regret that human adipocytes induced from human adipose-derived stem cells (hADSCs) were not recognized as primary cultures by multiple reviewers. Therefore, we have included relevant experimental results of mouse primary adipocytes induced from stromal vascular fraction (SVF) in Figure 8E-H as a supplement. The thermal imaging device was used to measure the temperature of BAT, while the body temperature was measured at 9:00 using a rectal probe connected to a digital thermometer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper presents a data processing pipeline to discover causal interactions from time-lapse imaging data, and convicingly illustrates it on a challenging application for the analysis of tumor-on-chip ecosystem data. The core of the discovery module is the original tMIIC method of the authors, which is shown in supplementary material to compare favourably to two state-of-the-art methods on synthetic temporal data on a 15 nodes network.

      Strengths:

      This paper tackles the problem of learning causal interactions from temporal data which is an open problem in presence of latent variables. The core of the method tMIIC of the authors is nicely presented in connection to Granger- Schreiber causality and to the novel graphical conditions used to infer latent variables and based on a theorem about transfer entropy. tMIIC compares favourably to PC and PCMCI+ methods using different kernels on synthetic datasets generated from a network of 15 nodes. A full application to tumor-onchip cellular ecosystems data including cancer cells, immune cells, cancer-associated fibroblasts, endothelial cells and anti cancer drugs, with convincing inference results with respect to both known and novel effects between those components and their contact.

      The code and dataset are available online for the reproducibility of the results.

      We thank Reviewer #1 for highlighting the main results and strengths of our paper, as well as, for his/her recommendations below to further improve the manuscript.

      Weaknesses:

      The references to ”state-of-the-art methods” concerning the inference of causal networks should be more precise by giving citations in the main text, and better discussed in general terms, both in the first section and in the section of presentation of CausalXtract. It is only in the legend of the figures of the supplementary material that we get information. Of course, comparison on our own synthetic datasets can always be criticized but this is rather due to the absence of common benchmark and I would recommend the authors to explicitly propose their datasets as benchmark to the community.

      Following Reviewer #1’s suggestion, we now compare tMIIC’s performance to other state-of-the-art causal discovery methods for time series data in the main text and in a new Figure 2. This Figure 2 also highlights the relation between graph-based causal discovery methods for time series data and Granger-Schreiber temporal causality, as discussed in more details in Methods (Theorem 1).

      We also agree about the importance of sharing benchmark datasets with the community. This is the reason why we provide the dynamical equations of the 15-node benchmarks in Supplementary Tables 1 & 2, so that anyone can generate equivalent time series datasets of any desired length.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a methodology to perform causal (temporal) discovery. The approach appears to be robust and is tested in the different scenarios: one related with live-cell imaging data, and another one using synthetic (mathematically defined) time series data. They compare the performance of their findings against another well-know method by using metrics like F-score, precision and recall,

      Strengths:

      Performance, robustness, the text is clear and concise, The authors provide the code to review.

      We thank Reviewer #2 for his/her positive assessment of our work and the suggestions below to improve the manuscript.

      Weaknesses:

      One concern could be the applicability of the method in other areas like climate, economy. For those areas, public data are available and might be interesting to test how the method performs with this kind of data.

      While our main expertise concerns the analysis of biological and biomedical data, we agree that tMIIC (which is included in MIIC R package) could in principle be applied to other areas, like climate, economy.

      We have not included benchmarks on such diverse types of datasets in the present manuscript, which focuses on CausalXtract’s pipeline for the analysis and causal interpretation of live-cell time-lapse imaging data from complex cellular systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank Reviewer 1 for their helpful comments and hope that the changes made to the revised manuscript have addressed their points.

      This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audiovisual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant. 

      Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.

      One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.

      We performed several additional analyses to address concerns that our main result was caused by different eye movement patterns between conditions. We re-ran our key analyses using activity exclusively from frontal electrodes, which revealed poorer decoding performance than that from posterior electrodes. If eye movements were driving the non-linear enhancement in the audiovisual condition, we would expect stronger decoding using sensors closer to the source, i.e., the extraocular muscles. We also computed the correlations between average eye position and stimulus position for each condition to evaluate whether participants made larger eye movements in the audiovisual condition, which might have contributed to better decoding results. Though we did find evidence for eye movements toward stimuli, the degree of movement did not significantly differ between conditions.

      Furthermore, we note that the analysis using a stricter eye movement criterion, acknowledged in the Discussion section of the original manuscript, resulted in very similar results to the original analysis. There was significantly better decoding in the AV condition (as measured by d') than the MLE prediction, but this difference did not survive cluster correction. The most likely explanation for this is that the strict eye movement criterion combined with our conservative measure of (mass-based) cluster correction led to reduced power to detect true differences between conditions. Taken together with the additional analyses described in the revised manuscript and supplementary materials, the results show that eye movements are unlikely to account for differences between the multisensory and unisensory conditions. Instead, our decoding results likely reflect nonlinear neural integration between audio and visual sensory information.

      “Any experimental design that varies stimulus location needs to consider the potential contribution of eye movements. We computed correlations between participants’ average eye position and each stimulus position between the three sensory conditions (auditory, visual and audiovisual; Figure S1) and found evidence that participants made eye movements toward stimuli. A re-analysis of the data with a very strict eye-movement criterion (i.e., removing trials with eye movements >1.875º) revealed that the super-additive enhancement in decoding accuracy no longer survived cluster correction, suggesting that our results may be impacted by the consistent motor activity of saccades towards presented stimuli. Further investigation, however, suggests this is unlikely. Though the correlations were significantly different from 0, they were not significantly different from each other. If consistent saccades to audiovisual stimuli were responsible for the nonlinear multisensory benefit we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Interestingly, eye movements corresponded more to stimulus location in the auditory and audiovisual conditions than in the visual condition, indicating that it was the presence of a sound, rather than a visual stimulus, that drove small eye movements. This could indicate that participants inadvertently moved their eyes when localising the origin of sounds. We also re-ran our analyses using the activity measured from the frontal electrodes alone (Figure S2). If the source of the nonlinear decoding accuracy in the audiovisual condition was due to muscular activity produced by eye movements, there should be better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy of stimulus location from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 3).” 

      The univariate ERP analyses an outdated contrast, AV <> A + V to capture multisensory integration. A number of authors have pointed out the potential problem of double baseline subtraction when using this contrast, and have recommended a number of solutions, experimental and analytical. See for example: [1] and [2]. 

      (1) Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A. (2002). Cognitive Brain Research, 14, 106-114. 

      (2) Talsma, D., & Woldorff, M. G. (2005). Journal of cognitive neuroscience, 17(7), 1098-1114.

      We thank the reviewer for raising this point. Comparing ERPs across different sensory conditions requires careful analytic choices to discern genuine sensory interactions within the signal. The AV <> (A +V) contrast has often been used to detect multisensory integration, though any non-signal related activity (i.e. anticipatory waves; Taslma & Woldorff, 2005) or pre-processing manipulation (e.g. baseline subtraction; Teder-Sälejärvi et al., 2002) will be doubled in (A + V) but not in AV. Critically, we did not apply a baseline correction during preprocessing and thus our results are not at risk of double-baseline subtraction in (A + V). Additionally, we temporally jittered the presentation of our stimuli to mitigate the potential influence of consistent overlapping ERP waves (Talsma & Woldorff, 2005). 

      The results section should provide the neurometric curve/s used to extract the slopes of the sensitivity plot (Figure 2B). 

      We thank the reviewer for raising this point of clarification. The sensitivity plots for Figures 2B and 2C were extracted from the behavioural performance of the behavioural and EEG tasks, respectively. The sensitivity plot for Figure 2B was extracted from individual psychometric curves, whereas the d’ values for Figure 2C were calculated from the behavioural data for the EEG task. This information has been clarified in the manuscript.

      “Figure 1. Behavioural performance is improved for audiovisual stimuli. A) Average accuracy of responses across participants in the behavioural session at each stimulus location for each stimulus condition, fitted to a psychometric curve. Steeper curves indicate greater sensitivity in identifying stimulus location. B) Average sensitivity across participants in the behavioural task, estimated from psychometric curves, for each stimulus condition. The red cross indicates estimated performance assuming optimal (MLE) integration of unisensory cues. C) Average behavioural sensitivity across participants in the EEG session for each stimulus condition. Error bars indicate ±1 SEM.”

      The encoding model was fitted for each electrode individually; I wonder if important information contained as combinations of (individually non-significant) electrodes was then lost in this process and if the authors consider that this is relevant. 

      Although the encoding model was fitted for each electrode individually for the topographic maps (Figure 4B), in all other analyses the encoding model was fitted across a selection of electrodes (see final inset of Figure 3). As this electrode set was used for all other neural analyses, our model would allow for the detection of important information contained in the neural patterns across electrodes. This information has been clarified in the manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2). As the model was fitted for multiple electrodes, subtle patterns of neural information contained within combinations of sensors could be detected.”

      Neurobehavioral correlations could benefit from outlier rejection and the use of robust correlation statistics. 

      We thank the reviewer for raising this issue. Note, however, that the correlations we report are resistant to the influence of outliers because we used Spearman’s rho1 (as opposed to Pearson’s). This information has been communicated in the manuscript.

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “Neurobehavioural correlations. As behavioural and neural data violated assumptions of normality, we calculated rank-order correlations (Spearman’s rho) between the average decoding sensitivity for each participant from 150-250 ms poststimulus onset and behavioural performance on the EEG task. As Spearman’s rho is resistant to outliers (Wilcox, 2016), we did not perform outlier rejection.”

      “Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069”

      Many details that are important for the reader to evaluate the evidence and to understand the methods and analyses aren't given; this is a non-exhaustive list:  

      We thank the reviewer for highlighting these missing details. We have updated the manuscript where necessary to ensure the methods and analyses are fully detailed and replicable.

      - specific parameters of the stimuli and performance levels. Just saying "similarly difficult" or "marginally higher volume" is not enough to understand exactly what was done.  

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.”

      - where are stimulus parameters adjusted individually or as a group? Which method was followed?  

      To clarify, stimulus parameters (frequency, size, luminance, volume, location, etc.) were manipulated throughout pilot testing only. Parameters were adjusted to achieve similar pilot behavioural results between the auditory and visual conditions. For the experiment proper, parameters remained constant for both tasks and were the same for all participants.

      “During pilot testing, stimulus features (size, luminance, volume, frequency etc.) were manipulated to make visual and auditory stimuli similarly difficult to spatially localize. These values were held constant in the main experiment.”

      - specify which response buttons were used.

      “Participants were presented with two consecutive stimuli and tasked with indicating, via button press, whether the first (‘1’ number-pad key) or second (‘2’ number-pad key) interval contained the more leftward stimulus.”

      “At the end of each sequence, participants were tasked with indicating, via button press, whether more presentations appeared on the right (‘right’ arrow key) or the left (‘left’ arrow key) of the display.”

      - no information is given as to how many trials per condition remained on average, for analysis.  

      The average number of remaining trials per condition after eye-movement analysis is now included in the Methods section of the revised manuscript.

      “We removed trials with substantial eye movements (>3.75 away from fixation) from the analyses. After the removal of eye movements, on average 2365 (SD \= 56.94), 2346 (SD \= 152.87) and 2350 (SD \= 132.47) trials remained for auditory, visual and audiovisual conditions, respectively, from the original 2400 per condition.”

      - no information is given on the specifics of participant exclusion criteria. (even if the attrition rate was surprisingly high, for such an easy task).  

      The behavioural session also served as a screening task. Although the task instructions were straightforward, perceptual discrimination was not easy due to the ambiguity of the stimuli. Auditory localization is not very precise, and the visual stimuli were brief, dim, and diffuse. The behavioural results reflect the difficulty of the task. Attrition rate was high as participants who scored below 60% correct in any condition were deemed unable to accurately perform the task, were not invited to complete the subsequent EEG session, and omitted from the analyses. We have included the specific criteria in the manuscript.

      “Participants were first required to complete a behavioural session with above 60% accuracy in all conditions to qualify for the EEG session (see Behavioural session for details).”

      - EEG pre-processing: what filter was used? How was artifact rejection done? (no parameters are reported); How were bad channels interpolated?  

      We used a 0.25 Hz high-pass filter to remove baseline drifts, but no low-pass filter. In line with recent studies on the undesirable influence of EEG preprocessing on ERPs1, we opted to avoid channel interpolation and artifact rejection. This was erroneously reported in the manuscript and has now been clarified. For the sake of clarity, here we demonstrate that a reanalysis of data using channel interpolation and artifact rejection returned the same pattern of results. 

      (1) Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13, 2372. https://doi.org/10.1038/s41598-023-27528-0

      - specific electrode locations must be given or shown in a plot (just "primarily represented in posterior electrodes" is not sufficiently informative).  

      A diagram of the electrodes used in all analyses is included within Figure 3, and we have drawn readers’ attention to this in the revised manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2).” 

      - ERP analysis: which channels were used? What is the specific cluster correction method?

      We used a conservative mass-based cluster correction from Pernet et al. (2015) - this information has been clarified in the manuscript.

      “A conservative mass-based cluster correction was applied to account for spurious differences across time (Pernet et al., 2015).” 

      “Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85-93. https://doi.org/https://doi.org/10.1016/j.jneumeth.2014.08.003” 

      - results: descriptive stats on performance must be given (instead of saying "participants performed well").  

      The mean and standard deviation of participants’ performance for each condition in the behavioural and EEG experiments are now explicitly mentioned in the manuscript.

      “A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly higher sensitivity for the audiovisual stimuli (M = .04, SD = .02) than for the auditory stimuli alone (M = .03, SD = .01; Z = -3.09, p = .002), and than for the visual stimuli alone (M = .02, SD = .01; Z = -5.28, p = 1.288e-7; Figure 1B). Sensitivity for auditory stimuli was also significantly higher than sensitivity for visual stimuli (Z = 2.02, p = .044).” 

      “We found a similar pattern of results to those in the behavioural session; sensitivity for audiovisual stimuli (M = .85, SD = .33) was significantly higher than for auditory (M = .69, SD = .41; Z = -2.27, p = .023) and visual stimuli alone (M = .61, SD = .29; Z = -3.52, p = 4.345e-4), but not significantly different from the MLE prediction (Z = -1.07, p = .285).” 

      - sensitivity in the behavioural and EEG sessions is said to be different, but no comparison is given. It is not even the same stimulus set across the two tasks...  

      This relationship was noted as a potential explanation for the higher sensitivities obtained in the EEG task, and was not intended to stand up to statistical scrutiny. We agree it makes little sense to compare statistically between the EEG and behavioural results as they were obtained from different tasks. We would like to clarify, however, that the stimuli used in the two tasks were the same, with the exception that in the EEG task the stimuli were presented from 5 locations versus 8 in the behavioural task. To avoid potential confusion, we have removed the offending sentence from the manuscript:

      Reviewer 2:

      Their measure of neural responses is derived from the decoder responses, and this takes account of the reliability of the sensory representations - the d' statistics - which is an excellent thing. It also means if I understand their analysis correctly (it could bear clarifying - see below), that they can generate from it a prediction of the performance expected if an optimal decision is made combining the neural signals from the individual modalities. I believe this is the familiar root sum of squares d' calculation (or very similar). Their decoding of the audiovisual responses comfortably exceeds this prediction and forms part of the evidence for their claims. 

      Yet, superadditivity - including that in evidence in the principle of inverse effectiveness more typically quantifies the excess over the sum of proportions correct in each modality. Their MLE d' statistic can already predict this form of superadditivity. Therefore, the superadditivity they report here is not the same form of superadditivity that is usually referred to in behavioural studies. It is in fact a stiffer definition. What their analysis tests is that decoding performance exceeds what would be expected from an optimally weighted linear integration of the unisensory information. As this is not the common definition it is difficult to relate to behavioral superadditivity reported in much literature (of percentage correct). This distinction is not at all clear from the manuscript. 

      But the real puzzle is here: The behavioural data or this task do not exceed the optimal statistical decision predicted by signal detection theory (the MLE d'). Yet, the EEG data would suggest that the neural processing is exceeding it. So why, if the neural processing is there to yield better performance is it not reflected in the behaviour? I cannot explain this, but it strikes me that the behaviour and neural signals are for some reason not reflecting the same processing. 

      Be explicit and discuss this mismatch they observe between behaviour and neural responses. 

      Thank you, we agree that it is worth expanding on the observed disconnect between MSI in behaviour and neural signals. We have included an additional paragraph in the Discussion of the revised manuscript. Despite the mismatch, we believe the behavioural and neural responses still reflect the same underlying processing, but at different levels of sensitivity. The behavioural result likely reflects a coarse down-sampling of the precision in location representation, and thus less likely to reflect subtle MSI enhancements.

      “An interesting aspect of our results is the apparent mismatch between the behavioural and neural responses. While the behavioural results meet the optimal statistical threshold predicted by MLE, the decoding analyses suggest that the neural response exceeds it. Though non-linear neural responses and statistically optimal behavioural responses are reliable phenomena in multisensory integration (Alais & Burr, 2004; Ernst & Banks, 2002; Stanford & Stein, 2007), the question remains – if neural super-additivity exists to improve behavioural performance, why is it not reflected in behavioural responses? A possible explanation for this neurobehavioural discrepancy is the large difference in timing between sensory processing and behavioural responses. A motor response would typically occur some time after the neural response to a sensory stimulus (e.g., 70-200 ms), with subsequent neural processes between perception and action that introduce noise (Heekeren et al., 2008) and may obscure super-additive perceptual sensitivity. In the current experiment, participants reported either the distribution of 20 serially presented stimuli (EEG session) or compared the positions of two stimuli (behavioural session), whereas the decoder attempts to recover the location of every presented stimulus. While stimulus location could be represented with higher fidelity in multisensory relative to unisensory conditions, this would not necessarily result in better performance on a binary behavioural task in which multiple temporally separated stimuli are compared. One must also consider the inherent differences in how super-additivity is measured at the neural and behavioural levels. Neural super-additivity should manifest in responses to each individual stimulus. In contrast, behavioural super-additivity is often reported as proportion correct, which can only emerge between conditions after being averaged across multiple trials. The former is a biological phenomenon, while the latter is an analytical construct. In our experiment, we recorded neural responses for every presentation of a stimulus, but behavioural responses were only obtained after multiple stimulus presentations. Thus, the failure to find super-additivity in behavioural responses might be due to their operationalisation, with between-condition comparisons lacking sufficient sensitivity to detect super-additive sensory improvements. Future work should focus on experimental designs that can reveal super-additive responses in behaviour.”

      Re-work the introduction to explain more clearly the relationship between the behavioural superadditivities they review, the MLE model, and the superadditivity it actually tests. 

      We agree it is worth discussing how super-additivity is operationalised across neural and behavioural measures. However, we do not believe the behavioural studies we reviewed claimed super-additive behavioural enhancements. While MLE is often used as a behavioural marker of successful integration, it is not necessarily used as evidence for super-additivity within the behavioural response, as it relies on linear operations. 

      “It is important to consider the differences in how super-additivity is classified between neural and behavioural measures. At the level of single neurons, superadditivity is defined as a non-linear response enhancement, with the multisensory response exceeding the sum of the unisensory responses. In behaviour, meanwhile, it has been observed that the performance improvement from combining two senses is close to what is expected from optimal integration of information across the senses (Alais & Burr, 2004; Stanford & Stein, 2007). Critically, behavioural enhancement of this kind does not require non-linearity in the neural response, but can arise from a reliability-weighted average of sensory information. In short, behavioural performance that conforms to MLE is not necessarily indicative of neural super-additivity, and the MLE model can be considered a linear baseline for multisensory integration.”

      Regarding the auditory stimulus, this reviewer notes that interaural time differences are unlikely to survive free field presentation.

      Despite the free field presentation, in both the pilot test and the study proper participants were able to localize auditory stimuli significantly above chance. 

      "However, other studies have found super-additive enhancements to the amplitude of sensory event-related potentials (ERPs) for audiovisual stimuli (Molholm et al., 2002; Talsma et al., 2007), especially when considering the influence of stimulus intensity (Senkowski et al., 2011)." - this makes it obvious that there are some studies which show superadditivity. It would have been good to provide a little more depth here - as to what distinguished those studies that reported positive effects from those that did not.

      We have provided further detail on how super-additivity appears to manifest in neural measures.

      “In EEG, meanwhile, the evoked response to an audiovisual stimulus typically conforms to a sub-additive principle (Cappe et al., 2010; Fort et al., 2002; Giard & Peronnet, 1999; Murray et al., 2016; Puce et al., 2007; Stekelenburg & Vroomen, 2007; Teder- Sälejärvi et al., 2002; Vroomen & Stekelenburg, 2010). However, when the principle of inverse effectiveness is considered and relatively weak stimuli are presented together, there has been some evidence for super-additive responses (Senkowski et al., 2011).”

      “While behavioural outcomes for multisensory stimuli can be predicted by MLE, and single neuron responses follow the principles of inverse effectiveness and super- additivity, among others (Rideaux et al., 2021), how audiovisual super-additivity manifests within populations of neurons is comparatively unclear given the mixed findings from relevant fMRI and EEG studies. This uncertainty may be due to biophysical limitations of human neuroimaging techniques, but it may also be related to the analytic approaches used to study these recordings. For instance, superadditive responses to audiovisual stimuli in EEG studies are often reported from very small electrode clusters (Molholm et al., 2002; Senkowski et al., 2011; Talsma et al., 2007), suggesting that neural super-additivity in humans may be highly specific. However, information encoded by the brain can be represented as increased activity in some areas, accompanied by decreased activity in others, so simplifying complex neural responses to the average rise and fall of activity in specific sensors may obscure relevant multivariate patterns of activity evoked by a stimulus.”

      P9. "(25-75 W, 6 Ω)." This is not important, but it is a strange way to cite the power handling of a loudspeaker. 

      “The loudspeakers had a power handling capacity of 25-75 W and a nominal impedance of 6 Ω.” 

      I am struggling to understand the auditory stimulus: 

      "Auditory stimuli were 100 ms clicks". Is this a 100-ms long train of clicks? A single pulse which is 100ms long would not sound like a click, but two clicks once filtered by the loudspeaker. Perhaps they mean 100us. 

      "..with a flat 850 Hz tone embedded within a decay envelope". Does this mean the tone is gated - i.e. turns on and off slowly? Or is it constant?

      We thank the reviewer for catching this. ‘Click’ may not be the most apt way of defining the auditory stimulus. It was a 100 ms square wave tone with decay, i.e., with an onset at maximal volume before fading gradually. Given that the length of the stimulus was 100 ms, the decay occurs quickly and provides a more ‘click-like’ percept than a pure tone. We have provided a representation of the sound below for further clarification. This represents the amplitude from the L and R speakers for maximally-left and maximally-right stimuli. We have added this clarification in the revised manuscript. 

      Author response image 1.

      “Auditory stimuli were 100 ms, 850 Hz tones with a decay function (sample rate = 44, 100 Hz; volume = 60 dBA SPL, as measured at the ears).”

      P10. "Stimulus modality was either auditory, visual, or audiovisual. Trials were blocked with short (~2 min) breaks between conditions".

      Presumably the blocks were randomised across participants.

      Condition order was not randomised across participants, but counterbalanced. This has been clarified in the manuscript.

      “Stimulus modality was auditory, visual or audiovisual, presented in separate blocks with short breaks (~2 min) between conditions (see Figure 6A for an example trial). The order of conditions was counterbalanced across participants.” 

      P15. Feels like there is a step not described here: "The d' of the auditory and visual conditions can be used to estimate the predicted 'optimal' sensitivity of audiovisual signals as calculated through MLE." Do they mean sqrt[ (d'A)^2 + (d'V)^2] ? If it is so simple then it may as well be made explicit here. A quick calculation from eyeballing Figures 2B and 2C suggests this is the case.

      We thank the reviewer for raising this point of clarification. Yes, the ‘optimal’ audiovisual sensitivity was calculated as the hypotenuse of the auditory and visual sensitivities. This calculation has been made explicit in the revised manuscript.

      The d’ from the auditory and visual conditions can be used to estimate the predicted ‘optimal’ sensitivity to audiovisual signals as calculated through the following formula:

      "The perceived source location of auditory stimuli was manipulated via changes to interaural intensity and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992)." The stimuli were delivered by a pair of loudspeakers, and the incident sound at each ear would be a product of both speakers. And - if there were a time delay between the two speakers, then both ears could potentially receive separate pulses one after the other at different delays. Did they record this audio stimulus with manikin? If not, it would be very difficult to know what it was at the ears. I don't doubt that if they altered the relative volume of the loudspeakers then some directionality would be perceived but I cannot see how the interaural level and timing differences could be matched - as if the sound were from a single source. I doubt that this invalidates their results, but to present this as if it provided matched spatial and timing cues is wrong, and I cannot work out how they can attribute an azimuthal location to the sound. For replication purposes, it would be useful to know how far apart the loudspeakers were and what the timing and level differences actually were.

      The behavioural tasks each had evenly distributed ‘source locations’ on the horizontal azimuth of the computer display (8 for the behavioural session, 5 for the EEG session). We manipulated the perceived location of auditory stimuli through interaural time delays and interaural level differences. By first measuring the forward (z) and horizontal (x) distance of each source location to each ear, the method worked by calculating what the time-course of a sound wave should be at the location of the ear given the sound wave at the source. Then, for each source location, we can calculate the time delay between speakers given the vectors of x and z, the speed of sound and the width of the head.  As the intensity of sound drops inversely with the square of the distance, we can divide the sound wave by the distance for each source location to provide the interaural level difference. Though we did not record the auditory stimulus with a manikin, our behavioural analyses show that participants were able to detect the directions of auditory stimuli from our manipulations, even to a degree that significantly exceeded the localisation accuracy for visual stimuli (for the behavioural session task). This information has been clarified in the manuscript.

      “Auditory stimuli were played through two loudspeakers placed either side of the display (80 cm apart for the behavioural session, 58 cm apart for the EEG session).” 

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.

      I am confused about this statement: "A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly greater sensitivity for the audiovisual stimuli than for the auditory stimuli alone (Z = -3.09, p = .002)," It is not clear from the methods how they attributed sound source angle to the sounds. Conceivably they know the angle of the loudspeakers, and this would provide an outer bound on the perceived location of the sound for extreme interaural level differences (although free field interaural timing cues can create a wider sound field). 

      Our analysis of behavioural sensitivity was dependent on the set ‘source locations’ that were used to calculate the position of auditory and audiovisual stimuli.  In the behavioural task, participants judged the position of the target stimulus relative to a central stimulus. Thus, for each source location, we recorded how often participants correctly discriminated between presentations. The quoted analysis acknowledges that participants were more sensitive to audiovisual stimuli than auditory stimuli in the context of this task. A full explanation of how source location was implemented for auditory stimuli has been clarified in the manuscript. 

      It would be very nice to see some of the "channel" activity - to get a feel for the representation used by the decoder. 

      We have included responses for the five channels as a Supplemental Figure.

      Figure 6 appears to show that there is some agreement between behaviour and neural responses - for the audiovisual case alone. The positive correlation of behavioural and decoding sensitivity appears to be driven by one outlier - who could not perform the audiovisual task (and indeed presumably any of them). Furthermore, if we were simply Bonferonni correct for the three comparisons, this would become non-significant. It is also puzzling why the unisensory behaviour and EEG do not correlate - which seems to again suggest a poor correspondence between them. Opposite to the claim made.

      We understand the reviewer’s concern here. We would like to note, however, that each correlation used unique data sets – that is, the behavioural and neural data for each separate condition. In this case, we believe a Bonferroni correction for multiple comparisons is too conservative, as no data set was compared more than once. Neither the behavioural nor the neural data were normally distributed, and both contained outliers. Rather than reduce power through outlier rejection, we opted to test correlations using Spearman’s rho, which is resistant to outliers1. It is also worth noting that, without outlier rejection, the audiovisual correlation (p \= .003) would survive a Bonferroni correction for 3 comparisons. The nonsignificant correlation in the auditory and visual conditions might be due to the weaker responses elicited by unisensory stimuli, with the reduced signal-to-noise ratio obscuring potential correlations. Audiovisual stimuli elicited more precise responses both behaviourally and neurally, increasing the power to detect a correlation. 

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “We also found a significant positive correlation between participants’ behavioural judgements in the EEG session and decoding sensitivity for audiovisual stimuli. This result suggests that participants who were better at identifying stimulus location also had more reliably distinct patterns of neural activity. The lack of neurobehavioural correlation in the unisensory conditions might suggest a poor correspondence between the different tasks, perhaps indicative of the differences between behavioural and neural measures explained previously. However, multisensory stimuli have consistently been found to elicit stronger neural responses than unisensory stimuli (Meredith & Stein, 1983; Puce et al., 2007; Senkowski et al., 2011; Vroomen & Stekelenburg, 2010), which has been associated with behavioural performance (Frens & Van Opstal, 1998; Wang et al., 2008). Thus, the weaker signalto-noise ratio in unisensory conditions may prevent correlations from being detected.”

      Further changes:

      (1)   To improve clarity, we shifted the Methods section to after the Discussion. This change included updating the figure numbers to match the new order (Figure 1 becomes Figure 6, Figure 2 becomes Figure 1, and so on).

      (2)   We also resolved an error on Figure 2 (previously Figure 3). The final graph (Difference between AV and A + V) displayed incorrect values on the Y axis.

      This has now been remedied.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to elucidate the diversity and gene expression patterns of marine plankton using innovative collection and sequencing methodologies. Their work investigates the taxonomic and functional profiles of planktonic communities, providing insights into their ecological roles and responses to environmental changes.

      Strengths:

      The methodology utilized in this study, particularly the combination of single-cell sequencing and advanced bioinformatics techniques, represents a significant advancement in the field of plankton research. The application of the Smart-seq2 protocol for cDNA synthesis, followed by rigorous quality control measures, ensures high-quality data generation. This comprehensive approach not only enhances the resolution of the obtained genetic information but also allows for a more detailed exploration of the diversity and functional potential of the phytoplankton community.

      One of the major strengths of this study is the rigorous methodological approach, including precise sampling techniques and robust data analysis protocols, which enhance the reliability of the results. The use of advanced sequencing technologies allows for a comprehensive assessment of gene expression, significantly contributing to our understanding of plankton diversity and its implications for marine ecosystems.

      Weaknesses:

      While the evidence presented is solid, there are areas where the analysis could be expanded. The authors could further explore the ecological interactions within plankton communities, which would provide a more holistic view of their functional roles. Additionally, a broader discussion of the implications of their findings for marine conservation efforts could enhance the manuscript's impact.

      The choice of both the plankton net and filter pore size during the plankton collection process is critical, as these factors directly impact the types of phytoplankton collected. The use of a 25 μm filter paper, in particular, may result in the omission of many eukaryotic phytoplankton species. This limitation, combined with the characteristics of the plankton net, could affect the comprehensiveness and accuracy of the results, potentially influencing the study's conclusions regarding phytoplankton diversity.

      The timing of fixation is crucial, as it directly affects whether the measured transcriptome accurately represents the organisms' actual transcriptional state in their native water environment. If fixation occurred a significant time after sample collection, the transcriptomic data may not reflect their true in situ transcriptional activity, which greatly reduces the relevance of this method.

      Thank you for your time, effort, and expertise.

      We agree that additional analyses could improve our understanding of the plankton communities sampled. We have conducted an array of alternative analyses that were not included in the current manuscript and plan to perform new analyses over the next few months as part of a deeper revision of the manuscript. We are especially interested in “providing a more holistic view of the functions” of individual plankton within the community.

      As for the protocol details, the pore size of the filter paper was chosen to focus on ~100 micron-sized organisms as a starting point: they are likely to contain more RNA than smaller organisms, making them well suited for an initial proof of concept of the methodology. That choice, however, is not particularly tightly constrained, therefore smaller plankton could be captured. This is supported by the lack of correlation, in our data, between organismal size and number of detected sequencing reads.

      Timing to cell death/fixation is a common question we receive not just in this manuscript but any RNA-Seq from primary samples. In this case, plankton were seen swimming until picking, and after picking each organism was deposited within two seconds into a lysis buffer for fixation. Therefore, we do not have reason to believe that the transcriptional activity sampled in the sequencing reads differs in any major way from the one in living plankton. Nonetheless, a study specifically testing the effect of time between ocean sampling and reverse transcription would provide more quantitative information on this point.

      Reviewer #2 (Public review):

      Summary:

      The paper introduces Ukiyo-e-Seq, a novel method integrating microscopy with single-cell transcriptomics to study individual, uncultured eukaryotic plankton cells. By combining microscopic imaging with transcriptomic analysis, the approach links plankton morphology to gene expression, enabling taxonomic identification and functional protein exploration. Ukiyo-e-Seq was tested on 66 microbial eukaryotic cells, revealing taxonomic diversity across four superkingdoms and allowing analysis of protein complexes and developmental genes in individual species. According to the authors, this method has the potential to advance single-cell marine biodiversity studies by addressing limitations in traditional taxonomy and metatranscriptomics, especially for rare or uncultured organisms.

      However, the study's conclusions are often weakly supported by data, particularly given that this is not the first study to combine microscopy and single-cell transcriptomics of eukaryotic plankton using Smart-seq2.

      Strengths:

      A notable strength is the authors' generation of several single-cell transcriptomes for the diatom Chaetoceros, which could benefit from greater focus rather than broadly addressing eukaryotic single cells.

      Weaknesses:

      The study lacks comparison with other single-cell transcriptomics studies and it was presented as the first study that combines imaging and single-cell transcriptomics (smart-seq2) of eukaryotic plankton while in fact it is not. The sampling methodology is not replicable as the authors used a tea strainer instead of standard plankton collection equipment to filter larger cells. Terminology throughout the paper is unconventional, such as "public and private contigs," "single-organism genomics," "highly expressed contigs," and "optical methods." Additionally, the authors did not specify which database was used for taxonomic assignments. These issues may stem from the authors' limited background in microbial ecology. Overall, the study has many drawbacks and it could benefit from complete rewriting and focusing mainly on single-cell transcriptomics of diatoms.

      Thank you for your time, effort, and expertise.

      There might be a bit of confusion between single-cell and single-organism sequencing, likely due to lack of clarity in our initial submission. In particular, in this manuscript no effort was spent trying to dissociate oligocellular plankton into individual cells before sequencing. While probably feasible, we expect that to be technically much harder than single-organism sequencing as performed here. The reviewer does not reference a published paper where combined imaging and RNA-Seq of individual uncultured plankton has been achieved, and we were unable to find one in the scientific literature. As stated in the manuscript, others have already performed some work on cultured plankton and single-organism sequencing (without matching images) of uncultured environmental microorganisms.

      The suggestion to focus on a smaller biological niche such as diatoms and adopt language more familiar to that specific community is well received. Indeed, given that organisms as diverse as fish larvae and diatoms could be profiled with Ukiyo-e-Seq, future studies could use the same method to address specific questions with a deeper and more narrow scope. However, this manuscript is demonstrating the feasibility of Ukiyo-e-Seq and its ability to produce usable data for a broad spectrum of organisms: part of the scientific audience might not have a specific interest in diatoms.

      The tea strainer was used for coarse pre-filtering: the exact pore size, geometry and factory tolerance on those measurements are inconsequential because each organism is later chosen (or not) based on a high-resolution microscopy image (or multiple, if fluorescence is considered). This really is a strength of Ukiyo-e-Seq over FACS or droplet-based sorters, which can only collect coarse optical information from each organism for (typically) less than 1 millisecond. In Ukiyo-q-Seq, while the actual decision to pick an individual is currently manual (by the operator of the picker), it can be automated in principle. For instance, one could build a machine learning model of plankton taxonomy based on a large collection of labelled images and use predictions from such a model to automatically drive the picker (e.g. focussing on diatoms), increasing throughput. Even in that case, however, the initial filtering stages using tea strainers, plankton nets, filter paper etc. would not be critical for the final selection of individuals as long as they are not too restrictive.

      The database used for taxonomic assignment was the NCBI non-redundant nucleotide database, accessed through the reference library provided by Kraken2 (nt).

      Reviewer #3 (Public review):

      Gatt et al. present a novel take on single-cell RNA-sequencing from complex planktonic samples, introducing an approach they aptly named Ukiyo-e-Seq. This work combines environmental sampling with cell picking, microscopic imaging, and Smart-seq2 single-cell RNA sequencing to profile uncultured eukaryotic plankton. Developing single-cell approaches for such ecosystems is critical, given the poor representation of many planktonic species in cultures and reference databases. This work could help bridge existing technological gaps between morphological and molecular studies of aquatic microeukaryotes

      The authors argue that microscopy does not provide information on the biochemistry of species under consideration. At best, it provides taxonomic labeling of species within a sample, yet imaging fails to assess their metabolic state or to disentangle cryptic species. In a standard metatranscriptomic setup, the sequence pool is described by aligning assembled contigs with reference databases to obtain functional and taxonomic information. This complex community-level data is impossible to parse at the single-organism level. Moreover, by relying on reference datasets, a lot of potential information can be missed. The aim of the approach is to combine the strengths of both methods, generating single-cell transcriptomic data linked to individual plankton images.

      Strengths:

      Ukiyo-e-Seq generated a valuable dataset by combining imaging and transcriptomics for individual planktonic organisms from environmental samples. This multimodal approach has the potential to improve taxonomic predictions and functional insights at the single-organism level. This manuscript demonstrates the technical feasibility of such an approach. Data of this type is rare and thus represents a valuable resource to further advance single-cell sequencing of planktonic species from environmental samples.

      Weaknesses:

      (1) The merge-split strategy, where single-cell reads are pooled prior to assembly, is counterintuitive. Pooling obscures the single-organism resolution that single-cell methods aim to achieve. The approach might be useful for assembling low-coverage contigs, but risks masking unique expression profiles for transcripts unique to a given well. As an alternative, the authors could assemble each well independently to obtain well-specific transcriptomic bins. Assemblies could then be clustered based on sequence similarity, thereby imposing strict clustering parameters to maintain resolution, to create a common reference for downstream analysis if needed. In my opinion, better results would be obtained by implementing a per-well assembly and read mapping.

      (2) The focus on the top five most expressed contigs throughout the manuscripts' data analysis is a limiting choice, as it excludes most contigs. In the preprint, we are presented with a very narrow view of the data. Visualising the entire range of assembled contigs would provide a better picture of the transcriptomic composition and diversity per well. It would be interesting to assess if the full information could be used to preliminary bin transcriptomic sequences from individual wells, for example, by gathering all 'private' contigs with high read coverage in a single well. Does such a set represent a single complete eukaryotic transcriptome?

      (3) I missed a verification with (broad-scale) taxonomic assessments based on the associated microscopic images. In their goals, the authors state that a joint approach has the potential to discover new taxonomic biodiversity. I agree, and to me, this is what is exciting about the preprint, yet I miss an example or the right bioinformatic implementation to drive home this claim. Are there organisms in wells where poor taxonomic annotations, based on alignment to a reference database or the LCA approach implemented in Kraken2, would usually result in ignoring the species in classic metatranscriptomics? Can you advance the taxonomic annotation by referring back to the organisms' picture? Can manual assessment of taxonomy advance the results from the LCA approach?

      (4) The current use of AlphaFold to predict protein structures does not convincingly add to the study's core objectives.

      Overall, Ukiyo-e-Seq presents a promising method for studying single-cell diversity in environmental samples, though the bioinformatic pipeline requires refinement to support some of the claims made by the authors. Additionally, the manuscript would benefit from clarity and additional details in its methods and a more consistent approach to presenting results and summary statistics across all assembled contigs and all sampled wells, rather than focusing on selected wells.

      Thank you for your time and effort, and for your expertise on the matter.

      The suggestions to conduct additional bioinformatic analyses to explore more fully the criticality and potential of various design choices (e.g. meta-assembly) are well received. We have tried some of those ideas already (e.g. assembling individual wells) and we have considered but not yet conducted or polished others (e.g. a more thorough taxonomic verification). We will endeavour to carry out as many of those analyses as possible during the deeper revision process in the coming months.

      AlphaFold 3’s use was designed to demonstrate the ability to investigate protein-protein interactions from individual species. When two peptide sequences are detected within the same well, they are more likely to be potential interacting partners than in a metatranscriptomic study, because the compartmentalisation of reads into tens or hundreds of wells greatly reduces the search space of potential interaction partners (which has a baseline runtime complexity of n squared, where n is the number of peptide sequences identified).

      ----------

    1. Author response:

      The following is the authors’ response to the original reviews.

      We performed multiple new experiments and analyses in response to the reviewers concerns, and incorporated the results of these analyses in the main text, and in multiple substantially revised or new figures. Before embarking on a point-by-point reply to the reviewers’ concerns, we here briefly summarize our most important revisions.

      First, we addressed a concern shared by Reviewers #1-3 about a lack of information about our DNA sequences. To this end, we redesigned multiple figures (Figures 3, 4, 5, S8, S9, S10, S11, and S12) to include the DNA sequences of each tested promoter, the specific mutations that occurred in it, the resulting changes in position-weight-matrix (PWM) scores, and the spacing between promoter motifs. Second, Reviewers #1 and #2 raised concerns about a lack of validation of our computational predictions and the resulting incompleteness of the manuscript. To address this issue, we engineered 27 reporter constructs harboring specific mutations, and experimentally validated our computational predictions with them. Third, we expanded our analysis to study how a more complete repertoire of other sigma 70 promoter motifs such as the UP-element and the extended -10 / TGn motif affects gene expression driven by the promoters we study. Fourth, we addressed concerns by Reviewer #3 about the role of the Histone-like nucleoid-structuring protein (H-NS) in promoter emergence and evolution. We did this by performing both experiments and computational analyses, which are now shown in the newly added Figure 5. Fifth, to satisfy Reviewer #3’s concerns about missing details in the Discussion, we have rewritten this section, adding additional details and references. 

      We next describe these and many other changes in a point-by-point reply to each reviewer’s comments. In addition, we append a detailed list of changes to each section and figure to the end of this document.

      Reviewer #1 (Public Review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomenon in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations interact within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      We thank Reviewer #1 for taking the time to read and provide critical feedback on our manuscript. Their summary is fundamentally correct.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol that allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expressions of many different backgrounds in both directions in extremely high throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is a quantitative trait that we oversimplify when drawing boundaries.

      The point is well-taken. To clarify this and other issues, we have added a section on the limitations of our work to the Discussion. Within this section we include the following sentences (lines 675-680):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence.”

      Their approach to randomly mutagenizing promoters allowed them to find many anecdotal examples of different types of evolutions that may occur to increase or decrease promoter activity. However, the lack of validation of these phenomena in more controlled backgrounds may require us to further scrutinize their results. That is, their explanations for why certain mutations lead or obviate promoter activity may be due to interactions with other elements in the 'messy' backgrounds, rather than what is proposed.

      Thank you for raising this important point. To address it, we have conducted extensive new validation experiments for the newest version of this manuscript. For the “anecdotal” examples you described, we created 27 reporter constructs harboring the precise mutation that leads to the loss or gain of gene expression, and validated its ability to drive gene expression. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8-S11, and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affects the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. While this overarching conclusion is supported by the manuscript, the proposed mechanisms for explaining changes in promoter activity are not sufficiently validated to be taken for absolute truth. There is not sufficient description of the strength of emergent promoter motifs or their specific spacings from existing motifs within the sequence. Furthermore, they do not define a systematic process by which mutations are assigned to different categories (e.g. box shifting, tandem motifs, etc.) which may imply that the specific examples are assigned based on which is most convenient for the narrative.

      To summarize, Reviewer #1 criticizes the following three aspects of our work in this comment. 1) The mechanisms we proposed are not sufficiently validated. 2) The description of motifs, spacing, and PWM scores are not shown. 3) How mutations are classified into different categories (i.e. box-shifting, tandem motifs, etc.) is not systematically defined. 

      These are all valid criticisms. In response, we performed an extensive set of follow-up experiments and analyses, and redesigned the majority of the figures. Here is a more detailed response to each criticism:

      (1) Proposed mechanisms for explaining changes in promoter activity are not sufficiently validated. We engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3 and 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, and are labeled with a ′ (prime) symbol.

      (2) No sufficient description of the strength of emergent promoter motifs or their specific spacings. We redesigned the figures to include the DNA sequences of the parent sequences, as well as the degenerate consensus sequences for each mutation. We additionally now highlight the specific motif sequences, their respective PWM scores, and by how much the score changes upon mutation. Finally, we annotated the spacing of motifs. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated and for the reader to decide if an interaction is present or not.

      (3) No systematic process by which mutations are assigned to different categories such as box shifting, tandem motifs, etc. We opted to reformulate these categories completely, because the phenotypic effects of a previously mentioned “tandem motif” was actually a byproduct of H-NS repression (see the newly added Figure S12). 

      We also agree that the categories were ambiguous. We now introduce two terms: homo-gain and hetero-gain of -10 and -35 boxes. The manuscript now clearly defines these terms, and the relevant passage now reads as follows (lines 430-435): 

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter

      (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a -35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homo-gain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      Impact of the work on the field, and the utility of the methods and data to the community: From this study, we are more aware of different types of ways promoters can evolve and devolve, but do not have a better ability to predict when mutations will lead to these effects. Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset that may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interest to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements. Any additional context to understand the significance of the work:

      The task of computationally predicting whether a sequence drives promoter activity is difficult. By learning what types of mutations create or destroy promoters from this study, we are better equipped for this task.

      We thank Reviewer #1 again for their time and their thoughtful comments.

      Reviewer #2 (Public Review):

      Summary:

      Fuqua et al investigated the relationship between prokaryotic box motifs and the activation of promoter activity using a mutagenesis sequencing approach. From generating thousands of mutant daughter sequences from both active and non-active promoter sequences they were able to produce a fantastic dataset to investigate potential mechanisms for promoter activation. From these large numbers of mutated sequences, they were able to generate mutual information with gene expression to identify key mutations relating to the activation of promoter island sequences.

      We thank Reviewer #2 for reading and providing a thorough review of our manuscript. 

      Strengths:

      The data generated from this paper is an important resource to address this question of promoter activation. Being able to link the activation of gene expression to mutational changes in previously nonactive promoter regions is exciting and allows the potential to investigate evolutionary processes relating to gene regulation in a statistically robust manner. Alongside this, the method of identifying key mutations using mutual information in this paper is well done and should be standard in future studies for identifying regions of interest.

      Thank you for your kind words.

      Weaknesses:

      While the generation of the data is superb the focus only on these mutational hotspots removes a lot of the information available to the authors to generate robust conclusions. For instance.

      (1) The linear regression in S5 used to demonstrate that the number of mutational hotspots correlates with the likelihood of a mutation causing promoter activation is driven by three extreme points.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      (2) Many of the arguments also rely on the number of mutational hotspots being located near box motifs. The context-dependent likelihood of this occurring is not taken into account given that these sequences are inherently box motif rich. So, something like an enrichment test to identify how likely these hot spots are to form in or next to motifs.

      Another good point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and described in lines 272-296.

      (3) The link between changes in expression and mutations in surrounding motifs is assessed with two-sided Mann Whitney U tests. This method assumes that the sequence motifs are independent of one another, but the hotspots of interest occur either in 0, 3, 4, or 5s in sequences. There is therefore no sequence where these hotspots can be independent and the correlation causation argument for motif change on expression is weakened.

      This is a fair criticism and a limitation of the MWU test. To better support our reasoning, we engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12 and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      (4) The distance between -10 and -35 was mentioned briefly but not taken into account in the analysis.

      We have now included these spacer distances where appropriate. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. More “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      The authors propose mechanisms of promoter activation based on a few observations that are treated independently but occur concurrently. To address this using complementary approaches such as analysis focusing on identifying important motifs, using something like a glm lasso regression to identify significant motifs, and then combining with mutational hotspot information would be more robust.

      This is a great idea, and we pursued it as part of the revision. For each parent sequence, we mapped the locations of all -10 and -35 box motifs in the daughters, then reduced each sequence to a binary representation, either encoding or not encoding these motifs, also referred to as a “hot-encoded matrix.” We subsequently performed a Lasso regression between the hot-encoded matrices and the fluorescence scores of each daughter sequence. The regression then outputs “weights” to each of the motifs in the daughters. The larger a motif’s weight is, the more the motif influences promoter activity. The Author response image 1 describes our workflow.

      Author response image 1.

      We really wanted this analysis to work, but unfortunately, the computational model does not act robustly, even when testing multiple values for the hyperparameter lambda (λ), which accounts for differences in model biases vs variance.

      The regression assigns strong weights almost exclusively to -10 boxes, and assigns weak to even negative weights to -35 boxes. While initially exciting, these weights do not consistently align with the results from the 27 constructs with individual mutations that we tested experimentally. This ultimately suggests that the regression is overfitting the data.

      We do think a LASSO-regression approach can be applied to explore how individual motifs contribute to promoter activity. However, effectively implementing such a method would require a substantially more complex analysis. We respectfully believe that such an approach would distract from the current narrative, and would be more appropriate for a computational journal in a future study. 

      Because this analysis was inconclusive, we have not made it part of the revised manuscript. However, we hope that our 27 experimentally validated new constructs with individual mutations are sufficient to address the reviewer’s concerns regarding independent verification of our computational predictions.

      Other elements known to be involved in promoter activation including TGn or UP elements were not investigated or discussed.

      Thank you for highlighting this potentially important oversight. In response, we have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP). “

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      Reviewer #3 (Public Review):

      Summary:

      Like many papers in the last 5-10 years, this work brings a computational approach to the study of promoters and transcription, but unfortunately disregards or misrepresents much of the existing literature and makes unwarranted claims of novelty. My main concerns with the current paper are outlined below although the problems are deeply embedded.

      We thank Reviewer #3 for taking the time to review this manuscript. We have made extensive changes to address their concerns about our work.

      Strengths:

      The data could be useful if interpreted properly, taking into account i) the role of translation ii) other promoter elements, and iii) the relevant literature.

      Weaknesses:

      (1) Incorrect assumptions and oversimplification of promoters.

      - There is a critical error on line 68 and Figure 1A. It is well established that the -35 element consensus is TTGACA but the authors state TTGAAA, which is also the sequence represented by the sequence logo shown and so presumably the PWM used. It is essential that the authors use the correct -35 motif/PWM/consensus. Likely, the authors have made this mistake because they have looked at DNA sequence logos generated from promoter alignments anchored by either the position of the -10 element or transcription start site (TSS), most likely the latter. The distance between the TSS and -10 varies. Fewer than half of E. coli promoters have the optimal 7 bp separation with distances of 8, 6, and 5 bp not being uncommon (PMID: 35241653). Furthermore, the distance between the -10 and -35 elements is also variable (16,17, and 18 bp spacings are all frequently found, PMID: 6310517). This means that alignments, used to generate sequence logos, have misaligned -35 hexamers. Consequently, the true consensus is not represented. If the alignment discrepancies are corrected, the true consensus emerges. This problem seems to permeate the whole study since this obviously incorrect consensus/motif has been used throughout to identify sequences that resemble -35 hexamers.

      We respectfully but strongly disagree that our analysis has misrepresented the true nature of -35 boxes. First, accounting for more A’s at position 5 in the PWM is not going to lead to a “critical error.” This is because positions 4-6 of the motif barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only in 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B).

      In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn.

      In addition, we did not derive the PWMs as the reviewer describes. The PWMs we use are based on computational predictions that are in excellent agreement with experimental results. Specifically, the PWMs we use are from PMID 29728462, which acquired 145 -10 and -35 box sequences from the top 3.3% of computationally predicted boxes from Regulon DB. See PMID 14529615 for the computational pipeline that was used to derive the PWMs, which independently aligns the -10 and -35 boxes to create the consensus sequences. The -35 PWMs significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). Within the 145 -35 boxes, the exact consensus sequence (TTGACA) that Reviewer #3 is concerned about is present 6 times in our matrix, and has a PWM score above the significance threshold. In other words, TTGACA, is classified to be a -35 box in our dataset.

      We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      - An uninformed person reading this paper would be led to believe that prokaryotic promoters have only two sequence elements: the -10 and -35 hexamers. This is because the authors completely ignore the role of the TG motif, UP element, and spacer region sequence. All of these can compensate for the lack of a strong -35 hexamer and it's known that appending such elements to a lone -10 sequence can create an active promoter (e.g. PMIDs 15118087, 21398630, 12907708, 16626282, 32297955). Very likely, some of the mutations, classified as not corresponding to a -10 or -35 element in Figure 2, target some of these other promoter motifs.

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).”

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608) and in the newly added Figure S13.

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      - The model in Figure 4C is highly unlikely. There is no evidence in the literature that RNAP can hang on with one "arm" in this way. In particular, structural work has shown that sequencespecific interactions with the -10 element can only occur after the DNA has been unwound (PMID: 22136875). Further, -10 elements alone, even if a perfect match to the consensus, are non-functional for transcription. This is because RNAP needs to be directed to the -10 by other promoter elements, or transcription factors. Only once correctly positioned, can RNAP stabilise DNA opening and make sequence-specific contacts with the -10 hexamer. This makes the notion that RNAP may interact with the -10 alone, using only domain 2 of sigma, extremely unlikely.

      This is a valid criticism, and we thank the reviewer for catching this problem. In response, we have removed the model and pertinent figures throughout the entire manuscript.

      (2) Reinventing the language used to describe promoters and binding sites for regulators.

      - The authors needlessly complicate the narrative by using non-standard language. For example, On page 1 they define a motif as "a DNA sequence computationally predicted to be compatible with TF binding". They distinguish this from a binding site "because binding sites refer to a location where a TF binds the genome, rather than a DNA sequence". First, these definitions are needlessly complicated, why not just say "putative binding sites" and "known binding sites" respectively? Second, there is an obvious problem with the definitions; many "motifs" with also be "bindings sites". In fact, by the time the authors state their definitions, they have already fallen foul of this conflation; in the prior paragraph they stated: "controlled by DNA sequences that encode motifs for TFs to bind". The same issue reappears throughout the paper.

      We agree that this was needlessly complicated. We now just refer to every sequence we study as a motif. A -10 box is a motif, a -35 box is a motif, a putative H-NS binding site is an H-NS motif, etc. The word “binding site” no longer occurs in the manuscript.

      - The authors also use the terms "regulatory" and non-regulatory" DNA. These terms are not defined by the authors and make little sense. For instance, I assume the authors would describe promoter islands lacking transcriptional activity (itself an incorrect assumption, see below)as non-regulatory. However, as horizontally acquired sections of AT-rich DNA these will all be bound by H-NS and subject to gene silencing, both promoters for mRNA synthesis and spurious promoters inside genes that create untranslated RNAs. Hence, regulation is occurring.

      Another fair point. We have thus changed the terminology throughout to “promoter” and “nonpromoter.”

      - Line 63: "In prokaryotes, the primary regulatory sequences are called promoters". Promoters are not generally considered regulatory. Rather, it is adjacent or overlapping sites for TFs that are regulatory. There is a good discussion of the topic here (PMID: 32665585). 

      We have rewritten this. The sentence now reads (lines 67-69):

      “A canonical prokaryotic promoter recruits the RNA polymerase subunit σ70 to transcribe downstream sequences (Burgess et al., 1969; Huerta and Collado-Vides, 2003; Paget and Helmann, 2003; van Hijum et al., 2009).”

      (3) The authors ignore the role of translation.

      - The authors' assay does not measure promoter activity alone, this can only be tested by measuring the amount of RNA produced. Rather, the assay used measures the combined outputs of transcription and translation. If the DNA fragments they have cloned contain promoters with no appropriately positioned Shine-Dalgarno sequence then the authors will not detect GFP or RFP production, even though the promoter could be making an RNA (likely to be prematurely terminated by Rho, due to a lack of translation). This is known for promoters in promoter islands (e.g. Figure 1 in PMID: 33958766).

      We agree that this is definitely a limitation of our study, which we had not discussed sufficiently. In response, we now discuss this limitation in a new section of the discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - In Figure S6 it appears that the is a strong bias for mutations resulting in RFP expression to be close to the 3' end of the fragment. Very likely, this occurs because this places the promoter closer to RFP and there are fewer opportunities for premature termination by Rho.

      The reviewer raises a very interesting possibility. To validate it, we have performed the following analysis. We took the RFP expression values from the 9’934 daughters with single mutations in all 25 parent sequences (P1-RFP, P2-RFP, … P25-RFP), and plotted the location of the single mutation (horizontal axis) against RFP expression (vertical axis) in Author response image 2. 

      Author response image 2.

      The distribution is uniform across the sequences, showing that distance from the RBS is not likely the reason for this observation. Since this analysis was uninformative with respect to distance from the RBS, we chose not to include it in the manuscript.

      (4) Ignoring or misrepresenting the literature.

      - As eluded to above, promoter islands are large sections of horizontally acquired, high ATcontent, DNA. It is well known that such sequences are i) packed with promoters driving the expression on RNAs that aren't translated ii) silenced, albeit incompletely, by H-NS and iii) targeted by Rho which terminates untranslated RNA synthesis (PMIDs: 24449106, 28067866, 18487194). None of this is taken into account anywhere in the paper and it is highly likely that most, if not all, of the DNA sequences the authors have used contain promoters generating untranslated RNAs.

      Thank you for pointing out that our original submission was incomplete in this regard. We address these concerns by new analyses, including some new experiments. First, Rhodependent termination is associated with the RUT motif, which is very rich in Cytosines (PMID: 30845912). Given that our sequences confer between 65%-78% of AT-content, canonical rhodependent termination is unlikely. However, we computationally searched for rho-dependent terminators using the available code from PMID: 30845912, but the algorithm did not identify any putative RUTs. Because this analysis was not informative, we did not include it in the paper.

      We analyzed the role of H-NS on promoter emergence and evolution within our dataset using both experimental and computational approaches. These additional analyses are now shown in the newly-added Figure 5 and the newly-added Figure S12. We found that H-NS represses P22-GFP and P12-RFP and affects the bidirectionality of P20. More specifically, to analyze the effects of H-NS, we first compared the fluorescence levels of parent sequences in a Δhns background vs the wild-type (dh5α) background in Figure 5A. We found 6 candidate H-NS targets, with P22-GFP and P12-RFP exhibiting the largest changes in fluorescence (lines 496506):

      “We plot the fluorescence changes in Fig 5A as distributions for the 50 parents, where positive and negative values correspond to an increase or decrease in fluorescence in the Δhns background, respectively. Based on the null hypothesis that the parents are not regulated by H-NS, we classified outliers in these distributions (1.5 × the interquartile range) as H-NS-target candidates. We refer to these outliers as “candidates” because the fluorescence changes could also result from indirect trans-effects from the knockout (Mattioli et al., 2020; Metzger et al., 2016). This approach identified 6 candidates for H-NS targets (P2-GFP, P19-GFP, P20-GFP, P22-GFP, P12-RFP, and P20-RFP). For GFP, the largest change occurs in P22-GFP, increasing fluorescence ~1.6-fold in the mutant background (two-tailed t-test, p=1.16×10-8) (Fig 5B). For RFP, the largest change occurs in P12-RFP, increasing fluorescence ~0.5-fold in the mutant background (two-tailed t-test, p=4.33×10-10) (Fig 5B).” 

      We also observed that the Δhns background affected the bidirectionality of P20 (lines 507-511):

      “We note that for template P20, which is a bidirectional promoter, GFP expression increases ~2.6-fold in the Δhns background (two-tailed t-test, p=1.59×10-6). Simultaneously, RFP expression decreases ~0.42-fold in the Δhns background (two-tailed t-test, p=4.77×10-4) (Fig S12A). These findings suggest that H-NS also modulates the directionality of P20’s bidirectional promoter through either cis- or trans-effects.”

      We then searched for regions where losing H-NS motifs in hotspots significantly changed fluorescence. We identified 3 motifs in P12-RFP and P22-GFP (lines 522-528):

      “For P22-GFP, a H-NS motif lies 77 bp upstream of the mapped promoter. Mutations which destroy this motif significantly increase fluorescence by +0.52 a.u. (two-tailed MWU test, q=1.07×10-3) (Fig 5E). For P12-RFP, one H-NS motif lies upstream of the mapped promoter’s -35 box, and the other upstream of the mapped promoter’s -10 box. Mutations that destroy these H-NS motifs significantly increase fluorescence by +0.53 and +0.51 a.u., respectively (two-tailed MWU test, q=3.28×10-40 and q=4.42 ×10-50) (Fig 5F,G). Based on these findings, we conclude that these motifs are bound by H-NS.”

      We are grateful for the suggestion to look at the role of H-NS in our dataset. Our analysis revealed a more plausible explanation to what we formerly referred to as a “Tandem Motif” in the original submission. Previously, we had shown that in P12-RFP, when a -35 box is created next to the promoter’s -35 box, or a -10 box next to the promoter’s -10 box, that expression decreases. These new -10 and -35 boxes, however, also overlap with the two H-NS motifs in P12-RFP. We tested these exact point mutations in reporter plasmids and in the Δhns background, and found that the Δhns background rescues this loss in expression (see Figure S12). This analysis is in the newly added subsection: “The binding of H-NS changes when new 10 and -35 boxes are gained” and can be found at lines 529-563. We summarize the findings in a final paragraph of the section (lines 556-563):

      “To summarize, we present evidence that H-NS represses both P22-GFP and P12-RFP in cis. H-NS also modulates the bidirectionality of P20-GFP/RFP in cis or trans. In P22-GFP, the strongest H-NS motif lies upstream of the promoter. In P12-RFP, the strongest H-NS motifs lie  upstream of the -10 and -35 boxes of the promoter. We note that there are 16 additional H-NS motifs surrounding the promoter in P12-RFP that may also regulate P12-RFP (Fig S12G). Mutations in two of these two H-NS motifs can create additional -10 and -35 boxes that appear to lower expression. However, the effects of these mutations are insignificant in the absence of H-NS, suggesting that these mutations actually modulate H-NS binding.”

      We also agree that the majority of these sequences are likely driving the expression of many untranslated RNAs (see Purtov et al., 2014). We thus now define a promoter more carefully as follows (lines 113-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).” 

      We also state this as a limitation of our study in the Discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - The authors state that GC content does not correlate with the emergence of new promoters. It is known that GC content does correlate to the emergence of new promoters because promoters are themselves AT-rich DNA sequences (e.g. see Figure 1 of PMID: 32297955). There are two reasons the authors see no correlation in this work. First, the DNA sequences they have used are already very AT-rich (between 65 % and 78 % AT-content). Second, they have only examined a small range of different AT-content DNA (i.e. between 65 % and 78 %). The effect of AT-content on promoter emerge is most clearly seen between AT-content of between around 40 % and 60 %. Above that level, the strong positive correlation plateaus.

      We respectfully disagree that the reviewer’s point is pertinent because what the reviewer is referring to is the likelihood that the sequence is a promoter, which indeed increases with AT content, but we are focused on the likelihood that a sequence becomes a promoter through DNA mutation. We note that if a DNA sequence is more AT-rich, then it is more likely to have -10 and -35 boxes, because their consensus sequences are also AT-rich. However, H-NS and other transcriptional repressors also bind to AT-rich sequences. This could also explain the saturation observed above 60% AT-content in PMID 32297955. Perhaps we can address this trend in future works.

      - Once these authors better include and connect their results to the previous literature, they can also add some discussion of how previous papers in recent years may have also missed some of this important context.

      We apologize for this oversight. We have rewritten the Discussion section to include the following points below. Many of the newly added references come from the group of David Grainger, who works on H-NS repression, bidirectional promoters, promoter emergence, promoter motifs, and spurious transcription in E. coli. More specifically:

      (1) The role of pervasive transcription and the likelihood of promoter emergence (lines 614-621):

      “Instead, we present evidence that promoter emergence is best predicted by the level of background transcription each non-promoter parent produces, a phenomenon also referred to as “pervasive transcription” (Kapranov et al., 2007).

      From an evolutionary perspective, this would suggest that sequences that produce such pervasive transcripts – including the promoter islands (Panyukov and Ozoline, 2013) and the antisense strand of existing promoters (Dornenburg et al., 2010; Warman et al., 2021), may have a proclivity for evolving de-novo promoters compared to other sequences (Kapranov et al., 2007; Wade and Grainger, 2014).”

      (2) How our results contradict the findings from Bykov et al., 2020 (lines 622-640):

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity.”

      (3) How other sequence features besides the -10 and -35 boxes may influence promoter emergence and activity (lines 661-671):

      “These findings suggest that we are still underestimating the complexity of promoters. For instance, the -10 and -35 boxes, extended -10, and the UP-element may be one of many components underlying promoter architecture. Other components may include flanking sequences (Mitchell et al., 2003), which have been observed to play an important role in eukaryotic transcriptional regulation (Afek et al., 2014; Chiu et al., 2022; Farley et al., 2015; Gordân et al., 2013). Recent studies on E. coli promoters even characterize an AT-rich motif within the spacer sequence (Warman et al., 2020), and other studies use longer -10 and -35 box consensus sequences (Lagator et al., 2022). Another possibility is that there is much more transcriptional repression in the genome than anticipated (Singh et al., 2014). This would also coincide with the observed repression of H-NS in P22-GFP and P12-RFP, and accounts of H-NSrepression in the full promoter island sequences (Purtov et al., 2014).”

      (4) The limits of our experimental methodology (lines 675-686):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence. Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), posttranscriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004) “

      (5) An updated take-home message (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      (5) Lack of information about sequences used and mutations.

      - To properly assess the work any reader will need access to the sequences cloned at the start of the work, where known TSSs are within these sequences (ideally +/- H-NS, which will silence transcription in the chromosomal context but may not when the sequences are removed from their natural context and placed in a plasmid). Without this information, it is impossible to assess the validity of the authors' work.

      Thank you for raising this point. Please see Data S1 for the 25 template sequences (P1-P25) used in this study, and Data S2 for all of the daughter sequences.

      For brevity, we have addressed the reviewer’s request to look at the role of H-NS in their comment (4) “Ignoring or misrepresenting the literature.”

      We do not have information about the predicted transcription start sites (TSS) for the parent sequences because the program which identified them (Platprom) is no longer available. Regardless, having TSS coordinates would not validate or invalidate our findings, since we already know that the promoter islands produce short transcripts throughout their sequences, and we are primarily interested in promoters which can produce complete transcripts.

      - The authors do not account for the possibility that DNA sequences in the plasmid, on either side of the cloned DNA fragment, could resemble promoter elements. If this is the case, then mutations in the cloned DNA will create promoters by "pairing up" with the plasmid sequences. There is insufficient information about the DNA sequences cloned, the mutations identified, or the plasmid, to determine if this is the case. It is possible that this also accounts for mutational hotspots described in the paper.

      We agree that these are important points. To address the criticism that we provided insufficient information, we now redesigned all our figures to provide this information. Specifically, the figures now include the DNA sequences, their PWM predictions, and the exact mutations that lead to promoter activity. The figures with these changes are Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12. We now also provide more details about pMR1 in a new section of the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands

      The reviewer also makes a valid point about promoter elements of the plasmid itself. We addressed it with the following new analyses. First we re-examined each of the examples where new -10 and -35 boxes are gained or lost, to see if any of these hotspots occur on the flanking ends of the parent sequences. We looked specifically at the ends because they could potentially interact with -10 and -35 box-like sequences on the plasmid to form a promoter. 

      Only one of these hotspots (out of 27) occurred at the end of the cloned sequences, and is thus a candidate for the phenomenon the reviewer hypothesized. This hotspot occurs in P9-GFP, where gaining a -10 box at the left flank increases expression (see Figure S8E-F’). There is indeed a -35 box 22-23 bp upstream of this -10 box on the plasmid, which could potentially affect promoter activity. 

      We tested the GFP expression of a construct harboring the point mutation which creates this -10 box on the left flank of P9-GFP. However, there was no significant difference in fluorescence between this construct and the wile-type P9-GFP (see Figure S8E-F’). Thus, this -35 box on pMR1 is not likely creating a new promoter.

      (6) Overselling the conclusions.

      Line 420: The paper claims to have generated important new insights into promoters. At the same time, the main conclusion is that "Our study demonstrates that mutations to -10 and -35 boxes motifs are the primary paths to create new promoters and to modulate the activity of existing promoters". This isn't new or unexpected. People have been doing experiments showing this for decades. Of course, mutations that make or destroy promoter elements create and destroy promoters. How could it be any other way?

      In hindsight, we agree that the original conclusion was not very novel. Our new conclusion is that -10 and -35 boxes do not repress transcription, and that our current promoter models, even with the additional motifs like the UP-element and the extended -10, are insufficient to understand promoters (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to start by thanking the authors for presenting an interesting and well-written article for review. This paper is a welcome addition to the field, addressing modern questions in the longstanding area of bacterial gene regulation. It is both enlightening and inspiring. While I do have suggestions, I hope these are not perceived as a lack of optimism for the work.

      Thank you for your kind words and suggestions, and for providing an astute and constructive review. We feel that manuscript has greatly improved with your suggested changes.

      ABSTRACT:

      Line 11: The sentence, "It is possible that these motifs influence..." Could be rewritten to be clearer as it is the most important point of the manuscript. It is not obvious that you're talking about how the local landscape of motifs affects the probability of promoters evolving/devolving in this location.

      We have changed the sentence to read, “Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence.”

      INTRODUCTION:

      Line 68: Is the -35 consensus motif not TTGACA? Here it is listed as TTGAAA.

      Corrected from TTGAAA to TTGACA

      RESULTS:

      Line 92-94. In finding that the. The main takeaway from this work is that different sequences have different likelihoods of mutations creating promoters and so I believe this claim could be explored deeper with more quantitative information. Could the authors supplement this claim by including? Could you look at whether there is a correlation between the baseline expression of a parent sequence and Pnew? I expect even the inactive sequences to have some variability in measured expression.

      Thank you for this great idea. We followed up on it by plotting the baseline parent sequence fluorescence scores against Pnew. You are indeed correct, i.e., Pnew increases with baseline expression following a sigmoid function, and is now shown in Figure 1D. To report our new observations, we have added the following section to the Results (lines 219-232):

      “Although mutating each of the 40 non-promoter parent sequences could create promoter activity, the likelihood Pnew that a mutant has promoter activity, varies dramatically among parents. For each non-promoter parent, Fig 1D shows the percentage of active daughter sequences. The median Pnew is 0.046 (std. ± 0.078), meaning that ~4.6% of all mutants have promoter activity. The lowest Pnew is 0.002 (P25-GFP) and the highest 0.41 (P8-RFP), a 205-fold difference.

      We hypothesized that these large differences in Pnew could be explained by minute differences in the fluorescence scores of each parent, particularly if its score was below 1.5 a.u. Plotting the fluorescence scores of each parent (N=50) and their respective Pnew values as a scatterplot (Fig 1E), we can fit these values to a sigmoid curve (see methods). This finding helps to explain why P8-RFP has a high Pnew (0.41) and P25-GFP a low Pnew (0.002), as their fluorescence scores are 1.380 and 1.009 a.u., respectively. The fact that the inflection point of the fitted curve is at 1.51 a.u. further justifies our use of 1.5 a.u. as a cutoff for promoter and non-promoter activity.”

      Another potentially interesting analysis would be to see if k-mer content is correlated with Pnew. That is, determine the abundance of all hexamers in the sequence and see if Pnew is correlated with the number of hexamers present that is one nucleotide distance away from the consensus motifs (such as TcGACA or TAcAAT).

      We performed the suggested analysis by searching for k-mers that correlate with Pnew and found that no k-mer significantly correlates with Pnew (lines 240-248):

      “We then asked whether any k-mers ranging from 1-6 bp correlated with the non-promoter Pnew values (5,460 possible k-mers). 718 of these 1-6 bp k-mers are present 3 or more times in at least one non-promoter parent. We calculated a linear regression between the frequency of these 718 k-mers and each Pnew value, and adjusted the p-values to respective q-values (Benjamini-Hochberg correction, FDR=0.05). This analysis revealed six k-mers: CTTC, GTTG,

      ACTTC, GTTGA, AACTTC, TAACTT which correlate with Pnew. However, these correlations are heavily influenced by an outlying Pnew value of 0.41 (P8-RFP) (Fig S5C-H), and upon removing P8-RFP from the analysis, no k-mer significantly correlates with Pnew (data not shown)”

      Line 152-157: How did you define the thresholds for 'active' or 'inactive'? It is not clear in the methods how this distinction was made.

      We have more clearly defined these thresholds in the text. A sequence with promoter activity has a fluorescence score greater than 1.5 a.u. (lines 168-172):

      “We declared a daughter sequence to have promoter activity or to be a promoter if its score was greater than or equal to 1.5 a.u., as this score lies at the boundary between no fluorescence and weak fluorescence based on the sort-seq bins (methods). Otherwise, we refer to a daughter sequence as having no promoter activity or being a non-promoter.”

      Lines: 152-157: In trying to find the parent expression levels, no figure was available showing the distribution of parent expression levels. Furthermore, In looking at Data S2 & filtering out for sequences with distance 0 from the parent, I found the most active sequences did not match up with the sequences described as active in this section (e.g. p19 and p20 have a higher topstrand mean over P22, yet are not listed as active top strand sequences).

      We really appreciate you taking the time to examine the supplemental data. We previously listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. In hindsight, we realize that our wording was confusing. We thus rewrote the affected paragraph, such that the bidirectional promoters are now in both lists of GFP/RFP active parents. We also now make the distinction between “templates” which comprise our 25 promoter island fragments, and “parents”, where we treat both strands separately (50 parents total). The paragraph in question now reads (lines 173-187):

      “Because some sequences in our library are unmutated parent sequences, we determined that 10/50 of the parent sequences already encode promoter activity before mutagenesis. Specifically, three parents drove expression on the top strand (P19-GFP, P20-GFP, P22-GFP), and five did on the bottom strand (P6-RFP, P12-RFP, P13-RFP, P18-RFP, P19-RFP, P20-RFP, P21-RFP). Two parents harbor bidirectional promoters (P19 and P20). The remaining 40 parent sequences are non-promoters, with an average fluorescence score of 1.39 a.u. We note that some of these parents have a fluorescence score higher than 1.39 a.u., but less than 1.50 a.u. such as P8-RFP (1.38 a.u.), P16-RFP (1.39 a.u.), P9-GFP (1.49 a.u.), and P1-GFP (1.47 a.u.). Whether these are truly “promoters” or not, is based solely on our threshold value of 1.5 a.u. We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9RFP, P10-RFP, P11-GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25RFP). See Fig S4 for fluorescence score distributions for each parent and its daughters, and Data S2 for all daughter sequence fluorescence scores.”

      Please include a supplementary figure showing the different parent expression levels (GFP mean +/- sd). Also, please explain the discrepancy in the 'active sequences' compared to Data S2 or correct my misunderstanding.

      We have added this plot to Figure S4B. The discrepancy arose because we listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. previous response regarding the ambiguity.

      Line 182: I do not see 'Fuqua and Wagner 2023' in the references (though I am familiar with the preprint).

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      Lines 197 - 200: The distribution of hotspot locations should be compared to the distribution of mutations in the library. e.g. It is not notable that 17% of mutations are in -10 motifs if 17% of all mutations are in -10 motifs.

      Thank you for raising this point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and written in lines 272-296.

      Lines 253-264: Examples 3B, 3D, and 3F should indicate the spacing between the new and existing motifs. Are these close to the 15-19 bp spacer lengths preferred by sigma70?

      Point well taken. We now annotate the spacing of motifs in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, and S11. We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a 35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      Line 255: While fun, I am concerned about the 'Shiko' analogy. My understanding is the prevailing theory is that -35 recognition occurs before -10 recognition (https://doi.org/10.1073/pnas.94.17.9022, 10.1101/sqb.1998.63.141). Given this, the 'Shiko -35' concept in 3H is a bit awkward as it suggests that sigma70 stops at -10 motifs before planting down on the -35. Considering the cited paper is still in the preprint stages (and did not observe these Shiko -35 emergences), I am concerned about how this particular example will be received by the community. Perhaps more care could be done to verify that this example is consistent with generally accepted mechanisms of promoter recognition or a short clarification could be added to clarify the extent of the analogy.

      Thank you for raising this point. We decided to remove the Shiko analogy, because several readers assumed that it relates to the physical binding of RNA polymerase, rather than being an evolutionary mechanism of mutations forming complementary motifs in a stepwise manner.

      Lines 323-326: It would be helpful to describe a more systematic approach to defining emergence events into different categories. A clear definition of each category in the methods or main text would help others consistently refer to these concepts in the future. This could be helped by showing the actual parent vs daughter sequences as a supplementary figure to figures 4B, 4D, & 4G.

      We agree this could have been more clearly communicated. We have addressed this by 1) simplifying the nomenclatures of these categories and  2) clearly defining these categories, and 3) showing the actual parent vs daughter sequences in Figure 4, and Supplemental Figures S9, S10, S11, and S12. More specifically:

      (1) Simplifying the nomenclature. We highlight events where gaining new -10 and -35 boxes can modify the promoter activity of parent sequences with promoter activity. This occurs when a new -10 or -35 box appears that partially overlaps with the -10 or -35 box of the actual promoter. Thus, we rename two terms: hetero-gain and homo-gain, shown in Figure 4B:

      (2) We clearly define these categories (lines 430-435):

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a 35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homogain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      In the original manuscript, there was an additional third category, where gaining a -35 box upstream of the promoter’s -35 box, and gaining a -10 box upstream of the promoter’s -10 box decreased expression. We referred to this as a “tandem motif” and it can be found in Figure S12C,D. However, in response to comment “(4) Ignoring or misrepresenting the literature” from Reviewer #3, we carried out an analysis of the binding of H-NS (see Figure 5 and Figure S12). This analysis revealed that this “tandem motif” phenomenon was actually the result of changing the affinity of H-NS to these regions. Thus, the “tandem motif” is probably spurious.

      DISCUSSION:

      Line 378-379: Since hotspots are essentially areas where promoters appear, wouldn't it be obvious that having more hotspots (i.e. areas where more promoters appear) would equate to a higher probability of new promoters? It would be helpful to clarify why this isn't obvious. This could be resolved by adding more complexity to the statement, such as showing that the level of mutual information found in a hotspot or across all hotspots in a sequence is correlated with Pnew.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      Line 394-396: This comparison of findings to Bykov et al should include a bit more justification for the proposed mechanism and how it specifically was observed in this paper. What did they observe and how do these findings relate?

      We gladly followed this suggestion, and added the following two paragraphs to the discussion (lines 622-640).

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity. “

      METHODS:

      Line 500: Could you provide more details on PMR1 (e.g. size, copy number, RBS strength) or a reference? I could not find this easily.

      Thank you for pointing out this oversight. In response, we have added the following subsection to the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands.”

      Line 581: What was the sequencing instrument &/or depth?

      We now report this information as follows (Methods, lines 918-922):

      “Illumina sequencing

      The amplicon pool was sequenced by Eurofins Genomics (Eurofins GmbH, Germany) using a NovaSeq 6000 (Illumina, USA) sequencer, with an S4 flow cell, and a PE150 (Paired-end 150 bp) run. In total, 282’843’000 reads and 84’852’900’000 bases were sequenced. Raw sequencing reads can be found here: https://www.ncbi.nlm.nih.gov/bioproject/1071572.”

      SUPPLEMENT:

      Supplementary Figure 2: Why does the GFP control produce a bimodal distribution?

      The GFP+ culture was inoculated directly from a glycerol stock. The bimodal distribution probably results from a subset of the bacteria having lost the GFP-coding insert, because the left-most peak coincides with the negative control.

      Reviewer #2 (Recommendations For The Authors):

      This paper would benefit from a clear definition of what constitutes an active promoter as this is only mentioned as justification for the use of arbitrary values for fluorescence.

      Good point. To clarify, we now include this new paragraph in the introduction (lines 112-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).”

      There needs to be a clear distinction in the use of the word sequences as often interchange sequences when meaning the 25 parent sequences and then the 50 possible sequences directions the promoter can act. It is confusing going from one to the other.

      We agree that this distinction is important. To make it clearer, we now introduce an additional term (lines 119-130). Our experiments start from 25 promoter island fragments (P1-P25), which we now call template sequences. Each template sequence comprises both DNA strands. The parent sequences are the top and bottom strands of each template sequence. Therefore, there are now 50 parent sequences (P1-GFP, P1-RFP, P2-GFP…, P25-RFP). By treating each strand as its own sequence, we no longer have to refer to the strand, avoiding the earlier confusion.

      The description of the hotspots is often unclear and trying to determine if 3 out of 9 hotspots come from one parent sequence or multiple is not possible. A table denoting this information would be most helpful.

      We agree, and now provide this information in Data S3.

      Finally, the description of the proposed mechanism of promoter activation via mutation of motifs should not be in the results but in the discussion, as it has insufficient evidence and would require further experimental validation.

      We remedied this problem by providing experimental validation of the proposed mechanisms. Specifically, we created the precise mutations that caused a loss or gain of a -10 or a -35 box, and measured the level of gene expression they drive with a plate reader. Because we chose to provide this experimental validation, we opted to leave the mechanisms of promoter activation in the results section.

      The (Fuqua and Wagner 20023) paper is not in the references.

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      I enjoyed the paper and wish the authors the best for their future work.

      Thank you for taking the time to review our manuscript!

      Reviewer #3 (Recommendations For The Authors):

      The paper has major flaws. For example:

      The data need to be analysed with correct promoter sequence element sequences (TTGACA for the -35 element).

      The discrepancy lies in the frequency of A’s vs C’s at position #5 of the PWM. Our PWM was built with more A’s than C’s at this position, but also includes C’s in this position. However, we respectfully disagree that using a different -35 box PWM is going to change the outcomes of our study. First, positions 4-6 of the PWM barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B). In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn. Additionally, the -35 box PWM that we used significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      The data need to be analysed taking into account the role of other promoter elements and sequences for translation.

      Point well taken. 

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).” 

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “Mutations indeed created many new -10 and -35 boxes in our daughter sequences. On average, 39.5 and 39.4 new 10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new -35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      “The UP-element does not strongly influence promoter activity in our dataset.

      The UP element is an additional AT-rich promoter motif that can lie stream of a -35 box in a promoter sequence (Estrem et al., 1998; Ross et al., 1993). We asked whether the creation of UP-elements also creates or modulates promoter activity in our dataset. To this end, we first identified a previously characterized position-weight matrix for the UP element (NNAAAWWTWTTTTNNWAAASYM, PWM threshold score = 19.2 bits) (Estrem et al., 1998) (Fig S13A). We then computationally searched for UP-element-specific hotspots within the parent sequences, i.e., locations in which mutations that gain or lose UP-elements lead to significant fluorescence increases (Mann-Whitney U-test, Fig S7 and methods. See Data S8 for the coordinates, fluorescence changes, and significance). The analysis did not identify any UP elements whose mutation significantly changes fluorescence. 

      We then repeated the analysis with a less stringent PWM threshold of 4.8 bits (1/4th of the PWM threshold score). This time, we identified 74 “UP-like” elements that are created or destroyed at unique positions within the parents. 23 of these motifs significantly change fluorescence when created or destroyed. However, even with this liberal threshold, none of these UP-like elements increase fluorescence by more than 0.5 a.u. when gained, or decrease fluorescence by more than 0.5 a.u. when lost (Fig S13B). This finding ultimately suggests that the UP element plays a negligible role in promoter emergence within our dataset.”

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      The full sequences used need to be provided and mutations resulting in new promoters need to be shown.

      To Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, we have added the sequences which created or the destroyed the promoters, and their PWM scores.

      The paper needs to be rewritten to take into account the relevant literature on i) promoter islands (i.e. sections of horizontally acquired AT-rich DNA) ii) generation and loss of promoters by mutation.

      We have rewritten the introduction. The majority of these points are now addressed in the following two new paragraphs (lines 92-112):

      “Recent work shows that mutations can help new promoters to emerge from promoter motifs or from sequences adjacent to such motifs (Bykov et al., 2020; Fuqua and Wagner, 2023; Yona et al., 2018). However, encoding -10 and -35 boxes is insufficient to drive complete transcription of a gene coding sequence. For instance, the E. coli genome contains clusters of -10 and -35 boxes that are bound by RNA polymerase and produce short oligonucleotide fragments, but rarely create complete transcripts. Such clusters are called promoter islands, and are strongly associated with horizontally-transferred DNA (Bykov et al., 2020; Panyukov and Ozoline, 2013; Purtov et al., 2014; Shavkunov et al., 2009). 

      There are two proposed explanations for why promoter islands do not create full transcripts. First, the TF H-NS may repress promoter activity in promoter islands. This is because in a Δhns background, transcript levels from the promoter islands increases (Purtov et al., 2014). However, mutagenizing a specific promoter island (appY) until it transcribes a GFP reporter, reveals that in-vitro H-NS binding does not significantly change when GFP levels increase (Bykov et al., 2020). Thus, it is not clear whether H-NS actually represses the complete transcription of these sequences. The second proposed explanation is that excessive promoter motifs silence transcription. The aforementioned study found that promoter activity increases when mutations improve a -10 box to better match its consensus (TAAAAAT→TATACT), while simultaneously destroying surrounding -10 and -35 boxes (Bykov et al., 2020). However, we note that if these surrounding motifs never contributed to GFP fluorescence to begin with, then mutations could also simply have accumulated in them during random mutagenesis without affecting promoter activity.”

      In closing, we would like to thank all three reviewers again for your time to engage with this manuscript.

      Summary of specific changes that we have made to each section of the manuscript 

      • Abstract

      - We updated the abstract to include the finding that more than 1’500 new -10s and 35s are created in our dataset, but only ~0.3% of them actually create de-novo promoter activity.

      - We no longer highlight the conclusion that the majority of promoters emerge and evolve from -10 and -35 boxes.

      • Introduction

      - We have added more background information about the UP-element and the TGn motif.

      - We better describe the promoter islands and the results identified by Bykov et al., 2020.

      • Results: Promoter island sequences are enriched with motifs for -10 and -35 boxes.

      - We clarify how the -10 and -35 PWMs we use were derived.

      - We refer to the 25 promoter island fragments as “Template sequences” (P1-P25). The “parent sequences” now correspond to the top and bottom strands of each template (N=50, P1-GFP, P1-RFP, P2-GFP, …, P25-RFP).

      - We elaborate that ~7% of the -10 boxes in the template sequences have the TGn motif.

      - In the previous version of the manuscript, if there were overlapping -10 boxes or overlapping -35 box, we counted these to be a single -10 box or a single -35 box, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the number of -10 and -35 boxes per parent have slightly increased.  

      •Results: Non-promoters vary widely in their potential to become promoters.

      - We make a clear distinction between promoters and non-promoters, and define the parent sequences.

      - We note that only 20% of parents with an “extended -10 box” have promoter activity.

      • Results: Promoter emergence correlates with minute differences in background promoter levels.

      - We added an analysis where we compare Pnew to the parent fluorescence levels, even if they are below 1.5 a.u. We find that the distribution of Pnew matches a sigmoid function.

      • Results: Promoter emergence does not correlate with simple sequence features

      - We added an analysis comparing k-mer counts to Pnew.

      - We updated the way we count -10 and -35 boxes, and recalculated the correlation with Pnew. The P and R2 values have changed, but Pnew still does not significantly correlate with -10 or -35 box counts.

      • Results: Promoters emerge and evolve only from specific subsets of -10 and -35 boxes

      - We have added an analysis where we computationally scramble the wild-type parent sequences while maintaining the coordinates of the mutual information hotspots. This reveals that the overlap with -10 and -35 motifs is not a coincidence of dense promoter motif encoding.

      We found a computational error in our analysis and updated the percent overlap between -10 boxes and -35 boxes with mutual information hotspots. The results are similar. o 14% of -10 boxes overlap with hotspots with our new way of defining -10 and -35 boxes.

      • Results: New -10 and -35 boxes readily emerge, but rarely lead to de-novo promoter activity

      - We quantify how often a new -10 and -35 box is created at a unique position within our collection of promoter fragments, and how often this results in a -10 and -35 box being appropriately spaced, and how often this actually leads to de-novo promoter activity. o We quantify how often a TGn sequence lies upstream of a new -10 box.

      • Results: Promoters can emerge when mutations create motifs but not by destroying them.

      - For each example, we added the DNA sequences of the wild-type region of interest and the mutant region of interest that results in the gain of promoter activity, and their respective PWM scores. 

      - We created constructs to validate each example by testing their fluorescence on a plate reader.

      - We removed the P1-GFP example from the main figure, as it was a false-positive in the dataset. It is now in Fig S8.

      - We removed the Shiko Emergence metaphor because it could be confused with a binding mechanism for RNA polymerase.

      • Results – Gaining new motifs over existing motifs increases and decreases promoter activity.

      - We removed the “Tandem motif” because it is more likely caused by H-NS binding.

      - We renamed the mechanisms to be “hetero-gain” and “homo-gain” for simplicity, and clearly define how we classified each sequence into each category.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the predicted point mutations.

      • Results – Histone-like nucleoid-structuring protein (H-NS) represses P12-RFP and P22-GFP.

      - This is a new analysis, which explores the role of the TF H-NS in repressing the parent sequences. 

      - We identified putative H-NS motifs in P12-RFP and P22-GFP.

      - We show experimentally that in a H-NS null background, a bidirectional promoter (P20) becomes unidirectional, even though P20 does not contain an obvious H-NS motif.

      - In the original version of the manuscript, we describe a phenomenon where gaining a -35 box upstream of a promoter’s -35 box, or a -10 box upstream of a promoter’s -10 box significantly decreases expression. We called this phenomenon a “tandem motif.” However, in the newest version of the manuscript, we find that these fluorescence decreases are rescued in a H-NS null background, suggesting the finding was actually due to H-NS binding modulation and not -10 and -35 boxes.

      • Results – The UP-element does not strongly influence promoter activity in our dataset.

      We used a PWM for the UP element to see if gaining or losing UP motifs was significantly correlated with increasing or decreasing expression. Even with a liberal PWM threshold, the analysis did not find any UP elements.

      • Discussion

      - We rewrote the discussion to account for the new analyses and the results on H-NS, the UP-element, and the extended -10.

      - We better explain how our results clash with the results from the Bykov paper.

      - We fit our results into the context of David Grainger’s papers.

      • Methods

      - Added an explanation about pMR1.

      - Added methods describing how we created the point mutation constructs.

      - Added the methods for the plate reader.

      - Added the methods for Illumina sequencing.

      - Added the methods for the sigmoid curve-fitting.

      • Figure 1

      - Panel E compares how Pnew (the probability of a daughter sequence having a fluorescence score greater than 1.5 a.u.) associates with the fluorescence scores of each parent sequence.

      - Panel F was originally in Figure S5. In the originally submitted version of the manuscript, if there were overlapping -10s or overlapping -35s, we counted these to be a single -10 or a single -35, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the r2 and p values have changed, but the conclusions have not (Pnew still does not significantly correlate with -10 or -35 box counts).

      • Figure 2

      - Panel C now includes a stacked barplot showing the percentage of -10 and -35 boxes that overlap with mutual information hotspots when the parent sequences are randomly scrambled computationally.

      • Figure 3

      - Panels A-C were added to explain how we define a new -10/-35 box, how many such new boxes each parent has. These panels also illustrate how we associate the presence or absence of a motif with significant changes in fluorescence scores of the daughter sequences.

      - We moved the example of P1-GFP to Figure S8 because when we tested the specific mutation which leads to gaining the -10 box, fluorescence did not change.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from reporter constructs harboring the point mutations predicted by our computational analyses.

      - Cartoons of RNA polymerase have been removed.

      • Figure 4

      - The tandem-motif has been removed from the figure.

      - Cartoons of RNA polymerase have been removed.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure 5

      - This is a new figure analyzing the role of H-NS in promoter evolution and emergence.

      • Figure S4

      - Panel B now shows the wild-type parent scores and their standard deviations from the sort-seq experiment.

      • Figure S5

      - Panels with -10 and -35 box counts moved to Figure 1.

      - The panel comparing Pnew to hotspot counts was removed.

      - Correlations between different k-mers and Pnew are added to panels C-H.

      • Figure S8

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S9

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S10

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S11

      - Added DNA sequences and PWM scores.

      • Figure S12

      - A new figure with further insights about H-NS.

      • Figure S13

      - A new figure regarding the UP-element analysis.

      • Figure S14

      - Added Panel D to show how we created mutant reporter constructs for validation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al. present CROWN-seq, a technique that simultaneously identifies transcription-start nucleotides and quantifies N6,2'-O-dimethyladenosine (m6Am) stoichiometry. This method is derived from ReCappable-seq and GLORI, a chemical deamination approach that differentiates A and N6-methylated A. Using ReCappable-seq and CROWN-seq, the authors found that genes frequently utilize multiple transcription start sites, and isoforms beginning with an Am are almost always N6-methylated. These findings are consistently observed across nine cell lines. Unlike prior reports that associated m6Am with mRNA stability and expression, the authors suggest here that m6Am may increase transcription when combined with specific promoter sequences and initiation mechanisms. Additionally, they report intriguing insights on m6Am in snRNA and snoRNA and its regulation by FTO. Overall, the manuscript presents a strong body of work that will significantly advance m6Am research.

      Strengths:

      The technology development part of the work is exceptionally strong, with thoughtful controls and well-supported conclusions.

      We appreciate the reviewer for the very positive assessment of the study. We have addressed the concerns below.

      Weaknesses:

      Given the high stoichiometry of m6Am, further association with upstream and downstream sequences (or promoter sequences) does not appear to yield strong signals. As such, transcription initiation regulation by m6Am, suggested by the current work, warrants further investigation.

      We thank the reviewer for the insightful comments. We have softened the language related to m6Am and transcription regulation. We totally agree with the reviewer that future investigation is required to determine the molecular mechanism behind m6Am and transcription regulation.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Decoding m6Am by simultaneous transcription-start mapping and methylation quantification" Liu and co-workers describe the development and application of CROWN-Seq, a new specialized library preparation and sequencing technique designed to detect the presence of cap-adjacent N6,2'-O-dimethyladenosine (m6Am) with single nucleotide resolution. Such a technique was a key need in the field since prior attempts to get accurate positional or quantitative measurements of m6Am positioning yielded starkly different results and failed to generate a consistent set of targets. As noted in the strengths section below the authors have developed a robust assay that moves the field forward.

      Furthermore, their results show that most mRNAs whose transcription start nucleotide (TSN) is an 'A' are in fact m6Am (85%+ for most cell lines). They also show that snRNAs and snoRNAs have a substantially lower prevalence of m6Am TSNs.

      Strengths:

      Critically, the authors spent substantial time and effort to validate and benchmark the new technique with spike-in standards during development, cross-comparison with prior techniques, and validation of the technique's performance using a genetic PCIF1 knockout. Finally, they assayed nine different cell lines to cross-validate their results. The outcome of their work (a reliable and accurate method to catalog cap-adjacent m6Am) is a particularly notable achievement and is a needed advance for the field.

      Weaknesses:

      No major concerns were identified by this reviewer.

      We thank the reviewer for the positive assessment of the method and dataset. We have addressed the concerns below.

      Mid-level Concerns:

      (1) In Lines 625 and 626, the authors state that “our data suggest that mRNAs initate (mis-spelled by authors) with either Gm, Cm, Um, or m6Am.” This reviewer took those words to mean that for A-initiated mRNAs, m6Am was the ‘default’ TSN. This contradicts their later premise that promoter sequences play a role in whether m6Am is deposited.

      We thank the reviewer for the comment. We have changed this sentence into “Instead, our data suggest that mRNAs initiate with either Gm, Cm, Um, or Am, where Am are mostly m6Am modified.” The revised sentence separates the processes of transcription initiation and m6Am deposition, which will not confuse the reader.

      (2) Further, the following paragraph (lines 633-641) uses fairly definitive language that is unsupported by their data. For example in lines 637 and 638 they state “We found that these differences are often due to the specific TSS motif.” Simply, using ‘due to’ implies a causative relationship between the promoter sequences and m6Am has been demonstrated. The authors do not show causation, rather they demonstrate a correlation between the promoter sequences and an m6Am TSN. Finally, despite claiming a causal relationship, the authors do not put forth any conceptual framework or possible mechanism to explain the link between the promoter sequences and transcripts initiating with an m6Am.

      (3) The authors need to soften the language concerning these data and their interpretation to reflect the correlative nature of the data presented to link m6Am and transcription initiation.

      For (2) and (3). We have softened the language in the revised manuscript. Specifically, for lines 633-641 in the original manuscript, we have changed “are often due to” into “are often related to” in the revised manuscript, which claims a correlation rather than a causation.

      Reviewer #3 (Public review):

      Summary:

      m6Am is an abundant mRNA modification present on the TSN. Unlike the structurally similar and abundant internal mRNA modification m6A, m6Am’s function has been controversial. One way to resolve controversies surrounding mRNA modification functions has been to develop new ways to better profile said mRNA modification. Here, Liu et al. developed a new method (based on GLORI-seq for m6A-sequencing), for antibody-independent sequencing of m6Am (CROWN-seq). Using appropriate spike-in controls and knockout cell lines, Liu et al. clearly demonstrated CROWN-seq’s precision and quantitative accuracy for profiling transcriptome-wide m6Am. Subsequently, the authors used CROWN-seq to greatly expand the number of known m6Am sites in various cell lines and also determine m6Am stoichiometry to generally be high for most genes. CROWN-seq identified gene promoter motifs that correlate best with high stoichiometry m6Am sites, thereby identifying new determinants of m6Am stoichiometry. CROWN-seq also helped reveal that m6Am does not regulate mRNA stability or translation (as opposed to past reported functions). Rather, m6Am stoichiometry correlates well with transcription levels. Finally, Liu et al. reaffirmed that FTO mainly demethylates m6Am, not of mRNA but of snRNAs and snoRNAs.

      Strengths:

      This is a well-written manuscript that describes and validates a new m6Am-sequencing method: CROWN-seq as the first m6Am-sequencing method that can both quantify m6Am stoichiometry and profile m6Am at single-base resolution. These advantages facilitated Liu et al. to uncover new potential findings related to m6Am regulation and function. I am confident that CROWN-seq will likely be the gold standard for m6Am-sequencing henceforth.

      Weaknesses:

      Though the authors have uncovered a potentially new function for m6Am, they need to be clear that without identifying a mechanism, their data might only be demonstrating a correlation between the presence of m6Am and transcriptional regulation rather than causality.

      We thank the reviewer for the very positive assessment of the CROWN-seq method. We have softened the language which is related to the correlation between m6Am and transcription regulation.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment.

      Comments to address most recent author response:

      The concern regarding the CRISPR experiments being confined to OVX mice is that the results can only suggest that CRISPR-mediated knockdown of TRPC5 can, at best, phenocopy the OVX+E condition. A reciprocal experiment in the opposite direction (for example, that returning TRPC5 to OVX levels in OVX+E mice prevents the changes in firing activity and pattern typical of the OVX+E2 condition) would strengthen the indication that E2-sensitive changes in TRPC5 expression and function are critically important to surge function. Acknowledging this as a limitation of the studies would help to better contextualize the value of the CRISPR experiments to an understanding of surge mechanisms when done only in OVX conditions.

      We have noted in the manuscript that “It would be of interest in future experiments to do the reciprocal experiment to see if overexpressing Trpc5 channels in Kiss1ARH neurons from OVX + E2 females restores the RMP and  “rescues” the synchronization phenotype.”

      The nature of the confusion regarding the consideration of OVX+E2 conditions in the computational model primarily arises from the methods description in the supplemental file: "The effect of E2 on ionic currents is modelled as a change in the maximum conductance parameter. For currents IM,IT, ICa and ITRPC5 this change is inferred from the qPCR data assuming that the conductance is directly proportional to the mRNA expression." If these were instead based on the whole-cell recordings as the authors now indicate in their response, then this description needs to be edited and clarified accordingly. Furthermore, the section states, "For ISK, IBK, Ileak, the OVX and OVX+E2 conductances are obtained from current-voltage relationships recorded from Kiss1ARH neurons in the absence/presence of iberiotoxin (BK blocker) and apamin (SK blocker). All other currents were assumed to be unaffected by E2." This section thus does not directly indicate that the recordings in the stated figures were used in the model, and moreover suggests that currents besides ISK, IBK, and Ileak were not different in OVX+E2 conditions.

      The prior evidence stated for correlation of mRNA and channel conductance is not explicitly cited in the manuscript. It is well known that post-translational modifications, physiological modulation of individual channel biophysical properties, and many other factors can influence the end output of a membrane conductance. Therefore, the authors should, at minimum, provide a literature citation supporting the assumption used here.

      We have re-written the paragraph on “Modelling the effects of E2” in the Supplemental Information (now Appendix 1)  to clarify the that the modeling was based on a combination of electrophysiological recordings and the qPCR data presented in this and previous publications. The statement that “all other currents were assumed to be unaffected by E2” was a misstatement and has been deleted. As per the reviewer’s request, we have listed seven publications that document the correlation between the mRNA expression and channel conductance for the various channels. We thank the reviewer for the suggestion.

      Reviewer #2 (Public review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents, while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology and CRISPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards and robust statistical analyses are provided throughout. The impact of E2 replacement on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One is that the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place (i.e. the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle). This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of all of these ionic currents will vary during the estrous cycle.

      We do know that the slow EPSP, which is generated by TRPC5 channels, tracks beautifully with the steroid state of female mice.  Using our E2 treatment paradigm that generates a LH surge in OVX females (left panel in Author response image 1), there is no difference in the amplitude of the slow EPSP in proestrous versus OVX + E2 females (right panel in Author response image 1).    

      Author response image 1.

      In addition, although the computational modeling indicates a role of the various E2-modulated conductances in causing a transition in ARC kisspeptin neuron firing pattern, their role is not directly tested in physiological recordings, weakening the link between these changes and the shift in firing patterns.

      In future experiments we will test directly the physiological contribution of the other E2-modulated conductances in causing the transition in the firing pattern of arcuate Kiss1 neurons using CRISPR/SaCas9 technology as we have documented for the TRPC5 channel (e.g., Figures 11 and 12).

      Overall, the manuscript provides interesting information about the effects of E2 on specific ionic currents in ARC kisspeptin neurons and some insights into the functional impact of these changes. However, some of the conclusions of the work, with regard, in particular, to the role of these changes in ion channels and to their implications for the LH surge, are not fully supported by the findings.

      ---------

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The manuscript has been substantially improved from the initial version by the addition of new experiments and clarification of important figures. Importantly, the overlap of data with previous reports from the same group has been corrected.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. Construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      Weaknesses:

      A remaining weakness in this revised version of the manuscript is that the relevance of the CRISPR experiments is still rather tenuous given that the goal is to understand what happens in the estrogen-treatment condition, and these experiments were performed only in OVX mice. Similar concerns reflect that the computational model examining the effect of E2 infers multiple conductances based on qPCR data and an assumption that the conductances are directionally proportional to the level of gene expression, and then tunes these to the current recordings obtained from OVX mice, without a direct confirmation in OVX+E2 conditions that the model parameters accurately reflect the properties of these currents in the presence of estrogen.

      We are still puzzled by Reviewer’s concerns about doing the CRISPRing of Trpc5 in the OVX+E2 females.  The Trpc5 channel expression is significantly reduced with the E2 treatment (Figure 10E) which we know translates into a minimal slow EPSP (Figure 2, Qiu eLife 2016) and is essentially equivalent to the slow EPSP amplitude in the Trpc5 mutagenesis in the ovariectomized females (Figure 12).  TRPC5 channel conductance is already at “rock bottom.”  The modeling informs us that such a low TRPC5 conductance will not support a long lasting slow EPSP and sustained firing (Figure 13A).

      Also, we respectively point out that we have published a score of papers over the past 20 years showing that the channel conductance does correlate with the mRNA expression (e.g., Qiu et al., eLife 2018).  Secondly, the model does take into consideration the OVX + E2 conditions (Figure 13B,C) which is based on the extensive whole-cell recordings presented in Figures 4,5,6,7,8 and 9.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology and CRISPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards and robust statistical analyses are provided throughout. The impact of E2 replacement on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One is that the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place (i.e. the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle). This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of all of these ionic currents will vary during the estrous cycle.

      Unfortunately, mice are a poor reproductive model since female mice do not have a clear follicular (estradiol-driven) phase distinctive from the luteal (progesterone-driven) phase.  Had we utilized a “proestrous” female, we could not with certainty distinguish between the effects of estradiol versus progesterone on the expression of the calcium and potassium channels that were the focus of this study.  Therefore, using our physiological model we can state with confidence that “estradiol elicits distinct firing patterns in arcuate nucleus kisspeptin neurons….”

      Overall, the manuscript provides interesting information about the effects of E2 on specific ionic currents in ARC kisspeptin neurons and some insights into the functional impact of these changes. However, some of the conclusions of the work, with regard, in particular, to the role of these changes in ion channels and their implications for the LH surge, are not fully supported by the findings.

      As we pointed out in the Discussion, the O’Byrne lab has clearly shown the relevance of Kiss1ARH neuronal burst firing and the release of glutamate to its effects on the LH surge:

      “Rather, we postulate that glutamate neurotransmission is more important for excitation of Kiss1AVPV/PeN neurons and facilitating the GnRH (LH) surge with high circulating levels of E2 when peptide neurotransmitters are at a nadir and glutamate levels are high in female Kiss1ARH neurons. Indeed, low frequency (5 Hz) optogenetic stimulation of Kiss1ARH neurons, which only releases glutamate in E2-treated, ovariectomized females (Qiu J. et al., 2016), generates a surge-like increase in LH release during periods of optical stimulation (Lin et al., 2021; Voliotis et al., 2021).  In a subsequent study optical stimulation of Kiss1ARH neuron terminals in the AVPV at 20 Hz, a frequency commonly used for terminal stimulation in vivo, generated a similar surge of LH (Shen et al., 2022).  Additionally, intra-AVPV infusion of glutamate antagonists, AP5+CNQX, completely blocked the LH surge induced by Kiss1ARH terminal photostimulation in the AVPV (Shen et al., 2022).”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for The Authors):

      The reviewer noted the following in the revised manuscript:

      - page 6, the authors may consider adding that presynaptic effects of blocking calcium channels on the slow EPSP cannot be fully ruled out. Indeed, the added experiments do indicate that some of the effects can be explained by impaired regulation of TRPC5 channels by calcium influx through calcium channels; however, the senktide-induced current is not fully blocked by the broad-spectrum calcium channel inhibitor cadmium, suggesting that the effect of blocking these channels on the slow EPSP may involve other mechanisms, such as presynaptic effects.

      Optogenetic stimulation of all Kiss1ARH neurons induces the release of NKB at “physiological” concentrations, which in turn generates a slow EPSP in the recorded Kiss1ARH neuron. Blocking voltage-gated calcium channels can inhibit the NKB release from presynaptic  Kiss1ARH neurons, thereby reducing the amplitude of the slow EPSP. However, in whole-cell recordings of synaptically isolated Kiss1ARH neurons,  senktide directly induces a large inward current (Figure 3F), which is generated by the opening of TRPC5 channels (Qiu et al. J. Neurosci 2021). Voltage-gated calcium channels are coupled to the activation of TRPC5 channels (Blair, Kaczmarek and Clapham, J. Gen Physiol 2009), so by blocking voltage-gated calcium channels, cadmium effectively abrogates the facilitating effects of these channels on TRPC5 channel activation and significantly reduces but does not abolish the inward (excitatory) current (Figures 3F-H). We have clarified in the Results (page 6) that the Kiss1ARH neurons were synaptically isolated as depicted in Figures 3F,G.

      - page 8, bottom, the mean value given for the apamin-sensitive current amplitude in E2 treated females does not match that plotted on the I/V graph in Figure 7F.

      Thank you for pointing out this typographical error, which we have corrected.

    1. Author response:

      Reviewer 1 (Public Review)

      (1) The proposed design is not sufficient to answer the research question. The rationale of the study proposed in the introduction is that auditory stimulation may explain the analgesic effects of RPMS. To answer this question, the authors should have used a factorial design using 4 groups (active RPMS + sound; active RPMS + no sound; sham RPMS + sound; sham RPMS + no sound). Using this design, it would have been possible to determine if the sound, the afferent stimulation, or both are necessary to produce analgesia. Rather, they tested two types of RPMS (iTBS, cTBS) without real rationale, one electrical stimulation and a placebo.

      We will clarify that the study design employed was originally designed to determine whether iTBS or cTBS would be more effective to reduce pain. We included TENS as a positive control, and sham as a negative control. We were indeed surprised by the findings, and present them herein. Future RCTs should be performed to reproduce these findings.

      (2) There are multiple ways that the current design could have introduced biases. The study was not randomized but pseudo-randomised. What does that mean? Was their allocation concealment? Was the assessor and data analyst blinded to group allocation? Did an intention to treat analyses were performed? Did the participants were adequately blinded (was it measured)?

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (3) The TENS parameters used were not optimal and are not those commonly used in clinical practice. This could have explained the lack of TENS effects. The lack of TENS effects has not been discussed and it is concerning. If TENS had been effective (as expected), the story about the auditory effects would not have been presented as the primary mechanisms underlying the current results.

      We acknowledge that this is a limitation of the study. A future study should address this. However, we will not remove the arm for transparency.

      (4) No primary outcome has been identified. It is important to mention that the interpretation of results is based on the presence of only one statistically significant result. Pain intensity and pain unpleasantness are not affected. This was not properly addressed in the Discussion. What does that mean that secondary hyperalgesia is affected but not pain?

      We reiterate that this study was not designed as an RCT, but rather an experimental study with The primary outcomes measures that capture change in  were measures of pain sensitivity (pain intensity NRS, pain unpleasantness NRS, and secondary hyperalgesia). We will clarify this in the revised manuscript.

      We will now include discussion of the effects being solely on secondary hyperalgesia, and not on pain intensity and unpleasantness.

      (5a) The use of secondary hyperalgesia variable is concerning. How is it possible to measure secondary hyperalgesia if there is no lesioned tissue?

      Secondary hyperalgesia refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain. We have cited other studies that have employed secondary hyperalgesia as a pain outcome measure without inducing a lesion.

      Hyperalgesia reflects increased pain on suprathreshold stimulation. Then, one measures the subjective response to a painful (i.e. suprathreshold) stimulation, then applies a conditioning stimulation (e.g. heat), and measures the subjective response to the same original stimulus. If the response after conditioning is higher than the baseline measure, hyperalgesia has been induced. Secondary hyperalgesia just refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain.

      (5b) If heat creates secondary hyperalgesia without lesion, what does that mean physiologically?

      Secondary hyperalgesia is normally interpreted as a perceptual correlate of central sensitization.

      (5c) Is it a valid and reliable "pain" variable?

      Yes and yes. A noxious heat stimulus can reliably elicit secondary hyperalgesia (see section 3.2 from Quesada et al. 2021). We also cite several studies that have used secondary hyperalgesia as an outcome measure of central sensitization in pain.

      (6) The follow-up study has been designed to cover the RPMS sound using pink noise. However, the pink noise was also present during the PHP measurement. How can we determine whether the absence of change is due to the pink noise during the RPMS or the presence of pink noise during PHP? I don't think this is possible to discriminate.

      We will add a third study that performs the control analysis with the sound of the rPMS masked, but no pink noise otherwise. The study will be performed in two groups: one with pink noise, and one without pink noise.

      Appraisal

      (7) Despite all these potential issues, authors interpret their data with high confidence and with several overstatements in the Title, Abstract, and Discussion. The results do not support their conclusions. The fact that auditory stimulation may produce an analgesic effect is a hypothesis, but the current study cannot ascertain it.

      We believe that the chief concern with the interpretation lies with concerns with the second study. The proposed third experiment will address these concerns.

      Reviewer 2 (Public Review):

      (1) My biggest concern in this paper is that the stimulation protocols are not applied after pain was induced in the subjects, but before. This is not bad in itself, but as the paper presents the stimulations as potential "treatments" it generates a severe mismatch between the objective, context (introduction), and impact (discussion) presented for the experiments, and how they are actually designed. This adds to the fact that healthy volunteers are used here to generate a study with low translational capability, that aims to be translational and provide an indication for clinics (maybe this is why the reduction in pain intensity caused by PMS when applied in patients, reported in references [29, 35 and 39], is not observed here).

      We will reframe these as prophylaxis, rather than treatment. This study was an experimental study originally designed to determine which stimulation parameters (cTBS or iTBS) would be better suited to modulate pain. We performed the study in healthy individuals undergoing acute pain, akin to a person undergoing painful procedure, which could lead to central sensitization and pain persistence (e.g., post-surgical pain). However, before testing this in individuals undergoing actual procedures, it is essential to determine efficacy in people before translation.

      Khan et al [29] is a case study with neuropathic pain, whereas our study uses a nociceptive pain model. Lim et al [35] employed 10 sessions of rPMS stimulation in patients with acute low back pain. Similar to our study, the change in VAS driven by rPMS was no different than the sham stimulation. We notice that there is no reference 39, and will correct this.

      (2) TENS treatment duration is simply too short (90s) to be considered a therapeutic TENS intervention. I get that this duration was chosen to match the one of PMS, but TENS is never applied like this in the clinics, in which the duration varies from 10 minutes to an hour (or more). This specific study comparing different durations recommends 40 minutes for knee osteoarthritis pain relief (PMID: 12691335). Under these conditions, this stimulation is more similar to a sham TENS than to a real TENS treatment: I would suggest interpreting it as such. As the paper is right now, it could give the impression that PMS could produce clinical effects not observed in TENS, but while the PMS application resembles a clinical one, the TENS application does not (due to its extremely short duration). As an example, giving paracetamol at a dose 10 times below its effective dose is a placebo, not a paracetamol treatment.

      We acknowledge that this is a limitation, and will address this in the Discussion of the revised manuscript.

      (3) This study measured pain, not central sensitization. Specifically, the effects refer to the area of secondary hyperalgesia. The IASP definition for central sensitization is "Increased responsiveness of nociceptive neurons in the central nervous system to their normal or subthreshold afferent input." (PMID: 32694387). No neuronal results are reported in this article. Therefore, central sensitization is not measured here, and we do not know if it is reduced by sound. This frontally clashes with the title of the article and with many interpretations of the results. For a deep review on this topic, I recommend PMID: 39278607 and the short article PMID: 30416715.

      It is widely accepted that central sensitization is the neurophysiological basis of secondary hyperalgesia (see PMID: 11313449; PMID: 10581220).

      The reviewer is conflating secondary hyperalgesia due to central sensitization and chronic pain. Whether chronic pain is driven or maintained by central sensitization is not the goal of our study. However, there is ample evidence that nociceptive drive can induce plasticity in the CNS, which alters pain sensitivity, and that these changes facilitate pain.

      (4a) There is no mention of blinding/masking/concealing in this manuscript. Was the therapist blind to whether they applied one protocol, another, or a placebo? Were the evaluators blind, as this can heavily influence their measurements? And the volunteers? Was allocation concealed? Was this blinding measured afterwards? Blinding is, together with randomization, the most important methodological feature for those interventional studies. For example, not introducing blinding and concealing directly makes a study lose 4 out of 10 points in the PEDro scale, failing to fulfill criteria 3, 5, 6, and 7 (https://pedro.org.au/english/resources/pedro-scale/).

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms). However, blinding was not measured afterwards (again, this was not meant to be an RCT).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (4b) Continuing with methodological considerations, the dropout percentage is high (18% for the first and 25% for the second study), both above the 15% cutoff for criterion 8 of the PEDro, losing another point.

      In the study, only 2 withdrew after feeling the heat, 2 were lost to follow up, and 2 had incomplete data. That totals 6/123 in Study 1. In study 2, none of the participants that met inclusion/exclusion criteria, and who were ‘allocated’ to the study were included (0% dropout/data loss).

      We are unsure how to address this point, as we had clear inclusion/exclusion criteria, and these could only be measured after consenting. As this is an experimental study performed on healthy individuals in a university setting, we are not able to collect any study related data prior to consent.

      We openly reported individuals who did not meet the criteria, and thus were excluded. These criteria are a combination of what is required to collect good quality data, and what we are ethically permitted to do. We understand that in an interventional trial where >15% drop out due to intolerance, or adverse events would indeed be concerning.

      (5) Data reporting and statistical treatment can be improved, as only differences are reported and regression to the mean is not accounted for in this study. Moreover, baseline levels for the dependent variables (control session) are not accessible for evaluation and they are not compared statistically, making it impossible to know if the groups were similar at baseline. This will imply failing criterion 3 of the PEDro, for a total of 2/10 points.

      This only concerns study 1, as study 2 is a within subject study design. Study 1 provides the raw data in Figure 4. We will provide the raw data for each of the primary outcome measures in a supplemental table in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Park et al. conducted various analyses attempting to elucidate the biological significance of SARS-CoV-2 mutations. However, the study lacks a clear objective. The specific goals of the analyses in each subsection are unclear, as is how the results from these subsections are interconnected. Compiling results from unrelated analyses into a single paper can be confusing for readers. Clarifying the objective and narrowing down the topics would make the paper's purpose clearer.

      The logic of the study is also unclear. For instance, the authors developed an evaluation score, APESS, for analyzing viral sequences. Although they state that the APESS score correlates with viral infectivity, there is no explanation in the results section about why this is the case.

      The structure of the paper should be reconsidered.

      Thank you for your feedback. We have heeded the input that the study lacks a clear objective and made sure that the overall goal of the study is reflected in the Abstract, Results, and Discussion.

      We have made sure that the specific goals in each subsection are clearer in the Results section that better explain the goals of those sections and elaborated on how the components of our study connect to each other. We have addressed these in more detail in the ‘Recommendations for the authors’ section.

      Thank you for the feedback on APESS, our evaluation model. APESS was created based on virus properties that we discovered of SARS-CoV-2 in our study. When applying our evaluation model, high APESS scores indicated high infectivity. APESS is calculated from a comprehensive evaluation of SARS-CoV-2 at the nucleotide, amino acid, and protein structure levels.

      The detailed explanations and exact calculations of APESS are detailed in the Materials and Methods section in line 571 but we should have been more detailed in the Results section as well. We have made sure to properly indicate this in the Results section in line 284.

      And overall, we have made edits to the manuscript that accurately explain our research by amending terms, restructuring arguments, and providing more clarity for the interconnectivity of the research.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a machine learning tool AIVE to predict the infectivity of SARS-CoV-2 variants and also a scoring metric to measure infectivity. A large number of virus sequences were used with a very detailed analysis that incorporates hydrophobic, hydrophilic, acid, and alkaline characteristics. The protein structures were also considered to measure infectivity and search for core mutations. The study especially focused on the S protein of SARS-CoV-2. The contents of this study would be of interest to many researchers related to this area and the web service would be helpful to easily analyze such data without in-depth bioinformatics expertise.

      Strengths:

      - Analysis of large-scale data.

      - Experimental validation on a partial set of searched mutations.

      - A user-friendly web-based analysis platform that is made public.

      Weaknesses:

      - Complexity of the research.

      Thank you for your kind feedback. Our study explored a wide range of topics including biochemical properties, machine learning, and viral infectivity.

      In presenting our research, we recognize that our comprehensive analysis may have slightly obscured the specific aims and overall objective of the study. We investigated properties in the viral sequences of SARS-CoV-2 and examined big data, clinical data, and expression data to elucidate their effect on viral infectivity. We then used evaluation modeling and in silico and in vitro validation.

      We have clarified the aims of our research and improved upon the flow of the manuscript by adding sentences that outline the goals of our research in the appropriate sub sections of the Results and Discussion sections.

      Reviewer #1 (Recommendations for the authors):

      The abstract should clearly state the backgrounds, objectives, strategies, and findings of this study in an orderly manner.

      Thank you for your feedback. We have restructured the Abstract to better reflect the goals and methods of our study. We start the Abstract by introducing the background of the study ‘An unprecedented amount of SARS-CoV-2 data has been accumulated compared with previous infectious diseases, enabling insights into its evolutionary process and more thorough analyses.’ in line 48. Then we more clearly stated the overall objectives of our research in line 50 as ‘This study investigates SARS-CoV-2 features as it evolves to evaluate its infectivity.’ Then, we clearly defined our specific discoveries in the virus, the purpose of our evaluation model, and how we validated our findings.

      In the Introduction, the message of each paragraph is unclear. Please clearly state the objectives of the study and what was done to achieve these objectives.

      Thank you for the feedback. We have updated the Introduction section to more clearly state the objectives of the study.

      To increase clarity, we have moved ‘Furthermore, hydrophobic properties in the amino acid sequence affect protein folding. Coronavirus hydrophobicity has significant effects on amino acid properties and protein folding.’ to line 127.

      In line 130, we rephrased the first sentence of the paragraph to ‘For these prior approaches to virus analysis and prediction, expertise with the relevant fields is required for a full understanding.’ to better establish the link between the background information and aims of the study. Then in line 134, we added ‘elucidate properties about the virus’ to clarify the aims of the study.

      In line 141, we have improved the clarity of the sentence to better present the scope and objectives of the study.

      The relationship between the sections in the Results is unclear. Clarify why each section is necessary and how they are interconnected.

      We investigated properties in the viral sequences of SARS-CoV-2 that highlighted amino acid substitutions or changes in polarity (Figure 1). In VOCs, we noted trends or absences of amino acid substitutions at specific positions (Figure 2). We examined epidemiological and clinical data to determine the infectivity, severity, and symptomaticity of lineages. Looking at expression data and binding affinity further illuminated the effect of amino acid substitutions (Figure 3). We created APESS, an evaluation modeling, that is comprehensively calculated from the nucleotide, amino acid, and protein structure levels of the virus. Evaluation of lineages revealed that higher APESS scores were associated with higher infectivity (Figure 4). We used in silico and in vitro validation to reinforce our findings then used machine learning to make predictions on future developments (Figure 5). We created candidate sequences for evaluation and utilized machine learning in predictions (Figure 6).

      We have added explanations to each section in Results that elucidate the objective of each section and how they connect with each other in the wider study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      Please define abbreviations when they first appear.

      We have added the full terms for the stated abbreviations in the relevant sections of the manuscript.

      In line 107, we have added the proper abbreviation for Our World in Data (OWID).

      In lines 143, 175, and 489 we have added the full term for Variants of Concern (VOCs).

      In line 160, we have added the full term for Receptor Binding Motif (RBM).

      Reviewer #2 (Recommendations for the authors):

      (1) pg 9, line 51, full name of RBM should be declared.

      We have added the full name of Receptor Binding Motif (RBM) to the appropriate section in the Abstract.

      (2) How are the Variants of Concern (VOCs) defined?

      Thank you for the comment and we apologize for the confusion. Variants of Concern as defined by the World Health Organization are specified in the Materials and Methods section. We have also added the full name for Variants of Concern (VOCs) when they are first mentioned in the Introduction and Results sections.

      (3) pg 17, line 297. The purpose of using AI/ML to predict amino acid substitutions at specific locations is not clear. The VOCs and related mutation loci were already searched, so the AA substitution prediction step seems a little repetitive. Is it to create customized sequences? Also, if prediction (or probability) was made, some performance evaluation would be helpful.

      Thank you for this feedback. The purpose of utilizing machine learning to make predictions about amino acid substitutions is to assess the possibility of amino acid substitutions occurring at specific locations. These potential amino acid substitutions were evaluated by APESS to have high scores, linking them to high infectivity. As the feedback suggests, amino acid substitutions in VOCs are researched but our prediction sought to ascertain the likelihood of amino acid substitutions that our evaluation model associated with infectivity. In the Results section in line 330, we assessed the probability of amino acid substitutions N460K and Q493R that the study found to be significant. The datasets that we utilized for these predictions are detailed in the Materials and Methods section in line 677.

      The models we trained with machine learning predicted the probability of mutations based on samples in each group and their performance was evaluated by comparing the presence of mutations in the clades they diverged from. We have added the following sentences to line 330: “We used Accuracy, Precision, Recall, and F1 score to evaluate performance. All models showed high performance scores above 0.95 in Precision, Recall, and F1 score. For accuracy, XGBoost, scored above 0.89, exhibiting relatively high performance while LightGBM scored above 0.78.”

      (4) pg 17, line 289. The objective of creating candidate lineages is not clear and would be helpful for the readers if its purpose is elaborated on. Since there are enough SARS-CoV-2 sequences, wouldn't it be more realistic and accurate to use those real sequences instead of creating them? Furthermore, the candidate lineages should be defined but they were missing in this section. This part made it a little difficult to follow the overall paper's logic.

      The manuscript should have been clearer on what ‘candidate lineages’ signified, we apologize for the confusion. In line 314, we included the following sentences for clarity: ‘We introduced amino acid substitutions at specific locations in the SARS-CoV-2 backbone for the wildtype and VOCs. The amino acid substitutions were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We then evaluated the infectivity of these candidate lineages with our evaluation model APESS.’

      The purpose of creating candidate lineages in our study was to assess the effect of specific amino acid substitutions on the virus’ infectivity. The amino acid substitutions we evaluated were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We determined that examining the introduction of specific amino acid substitutions to SARS-CoV-2 sequences would highlight the significance they had on infectivity. We have revised the paragraph in line 314 of the Results section to convey what we were doing.

      (5) This study covers very detailed contents regarding lineages, mutations, and their effect on infectivity. It would be more readable if subsections could be added per group of investigation, especially in the results and discussion section.

      In the Results section, we have emphasized the objective of each subsection and how they connect with one another for the overall goals of our study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      We have made edits to the Discussion section to more clearly indicate subsections.

      In line 389, we have added ‘In our investigation of various viruses’ to clearly indicate the background on other viruses.

      In line 409, we added the sentence ‘We made discoveries on specific amino acid substitutions at positions.’ to indicate the subsection talking about N437R, N460K, and D467 mutations.

      In line 471, we added the sentence ‘We created AIVE to feature our findings and analyses on an online platform.’ And modified the following sentence to better explain AIVE.

      (6) pg 26, line 557. The criteria for the SCPSi scores were set to 0.9 and 0.1 by the proportion of the Omicron and Delta variants. How do other criteria affect the performance of the method?

      Thank you for the question and check point. We used 0.9/0.1 for our initial criteria in our SCPS calculation. To determine how that affected performance, we have used 0.8/0.2 and 0.7/0.3 as the criteria.

      After calculating APESS with different SCPS weights (0.9/0.1, 0.8/0/2, 0.7/0.3), we used a Gaussian Mixture Model (GMM) to compare how the groups were divided based on APESS. All three groups with different SCPS weights were determined to accurately reflect data patterns when they had four components.

      When comparing parameter values, the group that used the original weights of 0.9 and 0.1 for SCPS showed the lowest values for variance and standard error across all four components. This indicates that each component was stable and clearly distinguishable from one another.

      The group where the weights were adjusted to 0.7 and 0.3 for SCPS showed significantly higher variance and a large error for the G2 component. The distribution of each component was more widespread, signifying that the stability and reliability was lower.

      The group where the weights were adjusted to 0.8 and 0.2 for SCPS was positioned between the two previous groups for finer data classification and reliability. However, the group notably lacked reliability when it came to the SE values for the G4 component.

      Thus, the original model with 0.9 and 0.1 weight is the most reliable.

      When the Gaussian Density for each group was plotted, the group with 0.9/0.1 SCPS weights showed the highest peak near 2 (G1), with a value of approximately 2. For the group with SCPS 0.8/0.2 weights, the highest peak appeared near 4.2 (G3), showing a high value around 14. For the group with SCPS 0.7/0.3 weights, the highest peak appeared near 3.7 (G3) showing a value around 5. The group with 0.9/0.1 SCPS weights exhibited a more uniform Gaussian distribution compared to the other two.

      Author response image 1.

      Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response table 1.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response image 2.

      Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response table 2.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response image 3.

      Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      Author response table 3.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      (7) Overall, the approach is very detailed and realistic. Just curious if this approach would be also applicable to other viruses such as influenza.

      We appreciate the insightful comments from the reviewer, and this is a direction we hope to take our research in the future. Our study focused on SARS-CoV-2 and the properties we discovered from the virus’ spike protein interacting with the host’s ACE2 receptor. In our investigation of other coronaviruses such as MERS-CoV, SARS-CoV-1 possesses a different structure and properties than these viruses as we have illustrated in Supplementary Figure 24. We had provided explanations about our investigation of other viruses in the Discussion section. In line 389, we have added ‘In our investigation of various viruses’ to better signpost this section.

    1. Author response:

      In this initial response to the public review, we outline our plan to address the major concerns raised. Below, we provide a general categorization of the suggestions and our corresponding responses

      Weakness #1: Statistical Concerns - using the number of seizures (rather than the number of animals) may identify small effects that could be insignificant. Effect size should be taken into consideration.

      Reviewer 1:

      “While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant.”

      Reviewer 2:

      “In several sections of the paper, the authors argue that two different groups are similar on the basis that no statistical difference was found between the two groups (i.e., p > 0.05); however, the failure to find a statistically significant difference, particularly with relatively small sample sizes, is not rigorous evidence that the two groups are actually similar - they are just "not significantly different.”

      Reviewer 3:

      “(3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.”

      Reviewer 3:

      “(1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      We understand the reviewers’ concerns. In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We will make text edits to enhance understanding of the linear mixed effect model.

      To address the point raised about similarity, we will explain how the SVM classifier was trained. The purpose of the SVM is not to identify meaningful differences between induced and spontaneous seizures. Rather, it is to classify EEG sections as “seizures” or non-seizures, demonstrating the gross similarity between induced and spontaneous seizures despite minor differences. We will make text clarifications for the SVM model.

      Weakness #2: Clinical and biological significance is unclear.

      Reviewer 1:

      “Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.”

      Reviewer 2:

      “While the paper may be relevant for the ETSP and contract research organizations (CROs), the paper was not written to attract the interest of biological scientists, even those in this specific area of epilepsy research. It may be of low interest to other neuroscientists… The key issue the authors aim to address is the 30-40% of patients with DRE, but the real problem with DRE patients is not that these people have seizures with no effect of the ASDs; rather, although ASD may reduce seizure burden, these patients continue to have some remaining seizures even after high doses of ASDs, which often leads to adverse effects from the particular ASDs… It remains unclear that the optogenetically induced seizures in this model are better than similarly induced seizures in a naïve animal, and there is no evidence that the model will be useful for finding new ASDs to treat DRE.”

      Reviewer 3:

      “(6) Human epilepsy is extensively heterogeneous in both etiology and individual phenotype, and it may be hard to generalize the approach.”

      Reviewer 2:

      “The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.”

      We appreciate the reviewer’s concerns. We will revise the manuscript to better emphasize the potential significance of our approach. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet Syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents the critical goal. However, we believe a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript.

      Weakness #3: Definition of Seizure is unclear

      Reviewer 2:

      “Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs”

      Reviewer 3:

      “(2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.”

      Reviewer 2:

      “The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.”

      Thank you for pointing out the unclear definition of the seizures analyzed. We agree and will revise the text to clarify this issue. In this manuscript, we focused on tonic-clonic seizures. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. Regarding epileptogenesis, our model is based on the IHK model, in which spontaneous tonic-clonic seizures occur a few to several days after KA injection. These mice are, by definition, epileptogenic. We will further clarify this methodology in the text.

      Weakness #4: Similarity/Difference with Kindling Not Clear

      Reviewer 2:

      “The authors did not test whether an apparent "kindling" effect, apparently seen in naïve controls, also occurred in animals micro-injected with kainic acid (KA). This effect could cause model instability that might result in variability in response to ASDs. It is not clear whether the number of optogenetically induced seizures in epileptic animals would affect the response to drugs. It is also unclear how much of an improvement the animal model in the present work is over other similar models of TLE, where electrically triggered seizures could simply be applied to one of them.”

      Reviewer 3:

      “(5) It is unlikely that long-term adaptation to CA1-stimulated seizure induction is absent in these mice. A duration of evaluation longer than 16 days is warranted in light of the downward slope at days 13-16 for induced seizures in Figure 4C.”

      We appreciate the reviewer’s comments regarding the “kindling effect” as well as its similarity to the kindling model. We will carefully assess the data and address this in the revised manuscript. In electrical kindling, the activated cellular population is non-specific, including both excitatory and inhibitory neurons. In our model, we specifically activate predominantly excitatory neurons (Thy1-positive neurons), which we observed to participate in convulsant-induced seizures (as demonstrated in Thy1-GCaMP experiments). We consider this specificity an improvement over the kindling model, making our approach more biologically relevant.

      Weakness #5: Time needed to generate model is significant. Unclear if animals were pre-selected

      Reviewer 1:

      “Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Reviewer 2:

      “The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.”

      We appreciate the reviewer’s insights. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      Reviewer 3:

      “(7) No mention or assessment of mouse sex as a biological variable.”

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wilson's Disease (WD) is an inherited rare pathological condition due to a mutation in ATP7B that alters mitochondrial structure and dysfunction. Additionally, WD results in dysregulated copper metabolism in patients. These metabolic abnormalities affect the functions of the liver and can result in cholecystitis. Understanding the immune component and its contribution to WD and cholecystitis has been challenging. In this work, the authors have performed single-cell RNA sequencing of mesenchymal tissue from three WD patients and three liver hemangioma patients.

      Strengths:

      The authors describe the transcriptomic alterations in myeloid and lymphoid compartments.

      Weaknesses:

      In brief, this manuscript lacks a clear focus, and the writing needs vast improvement. Figures lack details (or are misrepresented), the results section only catalogs observations, and the discussion needs to focus on their findings' mechanistic and functional relevance. The major weakness of this manuscript is that the authors do not provide a mechanistic link between the absence of ATP7B and NK cells' impaired/altered functions. While the work is of high clinical relevance, there are various areas that could be improved.

      In this study, we reported for the first time that ATP7B mutation and the resulting metabolic abnormalities in hepatocytes cause functional alteration of immune cells in WD patients. We dissected the transcriptional profiles of liver mesenchymal cells and delineated the functional differences of main immune cells in WD patients through scRNA-seq. The NK cell exhaustion and its clinical significance were further demonstrated.

      The mechanism study is of our concern. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication rather than through the direct action of ATP7B protein. How ATP7B mutation disturbs the metabolic homeostasis in hepatocyte, how metabolic pathways regulate the release of signal substances, and how signal substances act on the NK cells need to be explained. These contents, together with this manuscript, are beyond the scope of a single article, so we put the novelty in this manuscript.

      We sincerely appreciate the comments. We have improved the manuscript based on your valuable suggestions. The mechanism study is our subsequent research topic. We are actively promoting it and have found that ATP7B mutation rewires a certain metabolism pathway in hepatocyte, and that a critical metabolite functions as the mediator causing NK cell exhaustion.

      Reviewer #2 (Public Review):

      Summary:

      Wilson's disease is a rare genetic disorder caused by mutations in the ATP7B gene. Previous studies have documented that ATP7B mutations can disrupt copper metabolism, affecting brain and liver function. In this paper, the authors performed a retrospective clinical study and found that Wilson's disease has a high incidence of cholecystitis. Single-cell RNA-seq analysis revealed changes in the immune microenvironment, including the activation of immune responses and the exhaustion of natural killer cells.

      Strengths:

      A key finding of this study is that the predominant ATP7B gene mutation in the Chinese population is the 2333G>T (p. R778L) mutation. The authors reported associations between Wilson's disease and cholecystitis, as well as the exhaustion of natural killer cells.

      Weaknesses:

      The underlying mechanisms linking ATP7B mutations to cholecystitis and natural killer cell exhaustion remain unclear. Specifically, it is not yet determined whether copper metabolism alterations directly cause cholecystitis and natural killer cell exhaustion, or if these effects are secondary to liver dysfunction.

      In this study, we reported for the first time that ATP7B mutation and the resulting metabolic abnormalities in hepatocytes cause functional alteration of immune cells in WD patients. We dissected the transcriptional profiles of liver mesenchymal cells and delineated the functional differences of main immune cells in WD patients through scRNA-seq, focusing on the NK cell exhaustion and its clinical significance.

      The mechanism study is of our concern. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication, so we prioritize the studying of this aspect. How ATP7B mutation disturbs the metabolic homeostasis in hepatocyte, how metabolic pathways regulate the release of signal substances, and how signal substances act on the NK cells need to be explained. These contents, together with this manuscript, are beyond the scope of a single article, so we put the novelty in this manuscript.

      We sincerely appreciate the comments. The mechanism study is the topic of our follow-up study. We are actively promoting the research and we have found that ATP7B mutation rewires a certain metabolism pathway in hepatocyte, and that a critical metabolite functions as the mediator causing NK cell exhaustion.

      Reviewer #1 (Recommendations For The Authors):

      Major:

      (1) Abstract. A major portion of this manuscript focuses on non-NK cells. Data that describes NK cell exhaustion is only minimal. Therefore, the authors should modify the abstract.

      Thank you for your valuable suggestion. We have supplemented the description of functional changes in other immune cells, and have modified the abstract (line 31-35).

      (2) Introduction. There are three paragraphs. The first paragraph discusses cholecystitis. However, there are too many repetitions, and the information is unclear. In the second part, the authors discuss NK cells and their exhaustion. The authors do not establish a clear rationale or logic linking NK cells to WD or cholecystitis. In the last paragraph, the authors describe their findings. Their correlation between NK cell exhaustion and the poor healing process of cholecystitis has no direct experimental proof.

      Thank you for your comments. We have deleted the repetitions and rephrased some sentences (line 72-74). Briefly, in the first paragraph, we proposed the significant prognostic value of immune cell dysfunction for cholecystitis. In the second paragraph, we introduced NK cell exhaustion and its potential to predict prognosis of certain diseases. In the third paragraph, we introduced that the liver is a central organ involved in metabolism and immunity, holding a large number of NK cells. Liver pathologies commonly impact the development and outcome of inflammation-associated diseases such as cholecystitis. WD was selected as a research model. In the last paragraph, we introduced our findings from clinical study, scRNA-seq, clinical samples, and bioinformatics analysis, and concluded at the end.

      (3) Results. Overall, the results section lacks clarity and a clear focus. Figure legends need to be significantly detailed. The authors make too many broad statements without any support. The authors also make too many overstatements.

      Thank you for your valuable suggestion. We have improved the inaccurate statements and made detailed refinement of figure legends. All the changes are marked in the manuscript, and related responses are described below.

      Figure 1: No information is provided about the functional impairment of ATP7B protein due to the mutation found in the cohort of Chinese patients. What does 'immune abnormalities' (line 127) mean? What is the relevance of showing liver fibrosis and copper accumulation in the eye in Figure 1c and d, respectively? Total cholesterol concentrations are still within the range in the plasma of WD patients, but the authors call it higher. ECAR has not changed in WD patients, but the authors claim it has (line 117).

      (1) All these gene mutations in WD disable the protein function and cause the same outcome. (2) We have deleted the inappropriate statement. (3) In clinical observation, we found that WD not only causes copper accumulation in hepatocytes, but also leads to a variety of diseases, including liver fibrosis, Kayser-Fleischer Ring, and lower risk of hyperglycemia. We showed these together with the data of cholecystitis incidence. We think these might suggest the significance of intercellular communication between hepatocytes and other cells in microenvironment. (4) We have deleted the inappropriate statement (line 108-110, 112-113).

      Figure 2: Did the authors use the liver mesenchymal tissue or mesenchymal cells? Figure 2 states that they used mesenchymal cells, different from liver mesenchymal tissue. Numbers within Figure 2b UMAP are not visible. Were the initial T and NK cells annotated as indicated in Figure S2 (CD3D, CD#E, CD3G)? If so, that does not include NK cells.

      (1) The liver mesenchymal cells were used for scRNA-seq. (2) It is possible that the image resolution was reduced due to the compression of files by the submission system during merging process. We confirm that the image resolution of all figures meets publishing requirements, and that all characters on the figures are visible. You can download figure files to view details. (3) It was our negligence that the incomplete cell markers were shown in Figure S2. We have updated the markers (CD3D, CD3E, NKG7), references (Ref #53, #55, and #56), and related figures (Figure 2e, and Figure S2c).

      Figure 3: The authors should change 'Case' to 'WD patients' both in the text and figures. DEGs in Figure 3C indicate a transcriptomic alteration in the B cell compartment, which the authors do not delineate. Also, the rationale and explanation for the CellChat analyses are minimal. Concluding that a change occurred within the TME with minimal data and explanations is unfair.

      Thank you for your comments. (1) We apologize for the confusion caused by the use of nomenclatures and abbreviations in the text and figures. In all scRNA-seq data analysis, presentation, and description, we used specific terms (CASE and CON) to refer to the group of WD patients and controls, as well as their cell population. We have now unified the use of nomenclature in full text and defined them when first appeared (line 126-127), avoiding using lowercase form to prevent confusion. (2) We have now compared the expression of key genes of B cell between the two group in the next section “The dysfunction of main immune cells in WD patients” (line 230-235, Figure 4e, Figure S4e). (3) We have described the results of cellular communication in more detail (line 188-194). (4) We have modified the conclusion and all the related statement in full text (line 29-31, 82-84, 149, 194-195).

      Figure 4: This section deals with multiple cell types with minimal explanations. This section discusses various cell types, but it lacks focus. In particular, the T cell section should be separated and elaborated more in detail.

      (1) In this section, we intended to show the comparison in function of main immune cells that account for a considerable proportion, instead of just showing differently expressed genes that provide minimal information. The evaluation of functional signature, based on the integration of multiple gene expression, allows a direct understanding of the final outcome owing to transcriptional changes. (2) Given that the main functions of T cells did not change significantly and there were more significant changes in innate immunity, the T cell section is relatively short and unsuitable as a separated part.

      Figure 5: What are the distinct subsets of NK cells authors have found in the WD patients and controls? How do these subsets differ between the two groups in numbers and their transcriptomes? The presentation and labeling of Figure 5 and Supplementary Figure 5 need to be vastly improved. The pseudotime presentation in Figure 5b should be presented separately for the patients and the controls. Are the changes in gene expression presented in Figure 5a due to the change in the subset compositions? Figure 5c immuno-staining is not at all visible. A clear explanation should be given for the differences between Figure 5c and Figure 5e, where NKG2A expressions are shown. A better explanation for Figure 5d is required. Did the authors use all the antibodies with the same fluorochrome? If so, what color is that? Can the authors include the individual samples in the bar diagram in Figure 5e? Again, the data in Figure 5 is insufficient to conclude that NK cells are exhausted in WD patients. While the role of changes in the expression of T-BET and EOMES can be related to dysfunction and cellular exhaustion of NK cells, the statement made by the authors needs to be toned down as they do not test with independent experiments.

      (1) The subsets of NK cell were clustered by gene expression profile and labeled by the characteristically expressed gene, using certain algorithm in the routine procedure. They cannot be distinguished in clinical samples by one or several genes or other sorting methods. Thus, we were not able to analyze these subsets in clinical samples. (2) We have supplemented the comparison of numbers and transcriptomes of three NK subtypes between the two groups (line 268-273). (3) We have checked the figures and confirmed that all characters on the figures are visible. (4) We have separately presented the plot in Figure S5d. (5) We compared the expression level of genes presented in Figure 5a between the two groups in three NK subtypes and supplemented this part (line 264-268). The results were very consistent across the three subtypes, suggesting that the results in total NK population were contributed by all three subtypes and not affected by a single composition. (6) KLRC1 is also known as NKG2A. We are sorry for not making a clear explanation, and now we use KLRC1 only in all text to avoid confusion. We have made a more clear and detailed description for Figure 5c, 5d, and 5e (now labeled as Figure 5b, 5c, and 5d), and have included the fluorochrome in Figure 5d (now labeled as Figure 5c) and the individual value in Figure 5e (now labeled as Figure 5d) (line 293-299). (7) In this section, we found the upregulated expression of inhibitory receptors, downregulated expression of effector molecules, and the impaired NK cell-mediated cytotoxicity in NK cell of WD patients from scRNA-seq. Then we validated the findings in clinical liver section samples and clinical blood samples by mIHC and flow cytometry, respectively. According to the recent articles, exhausted NK cells are characterized by decreased production of effector cytokines (e.g., IFNγ), as well as by impaired cytolytic activity, and downregulate expression of certain activating receptors and upregulate expression of inhibitory receptors (e.g., 10.3389/fimmu.2017.00760, 10.1038/s41590-018-0132-0, 10.1038/s41467-019-09212-y, 10.1080/2162402X.2016.1264562). Therefore, we concluded NK cell exhaustion in WD patients. (8) In the part about transcription factors, we kept the description of objective data and deleted the statement of the contribution of transcription factors to NK exhaustion.

      Figure 6: Data presented in Figure 6 and the conclusion made in this manuscript are predictive. There is no direct testing of ATP7B in NK cells to show the functions of this gene. Extension of this to patient survival is purely speculative. As long as authors state these facts clearly in their text, it can be acceptable. However, they do not extend their conclusions to similar liver diseases.

      ATP7B mutation is hepatocyte-specific, and it does not occur in any immune cells. The function of ATP7B in NK cell was not studied. We found the NK exhaustion and poor prognosis of cholecystitis in WD patients. Given that there were researches demonstrating that NK exhaustion is correlated with poor liver cancer prognosis, we hypothesized that NK exhaustion contributes to the poor prognosis of cholecystitis. Bioinformatics studies confirmed our hypothesis and supported the extension of this result to other inflammatory diseases. We had no experimental data, but this result was reliable in bioinformatics method.

      (4) Discussion: While the authors analyzed multiple cell types, the discussion is primarily focused on NK cells. There is no clear link between copper utilization, NK cell function, and exhaustion that the authors articulate.

      Thank you for your comments. The focus of our study is NK cell exhaustion, which is experimentally proven, so we discussed this aspect. We prioritize the effect of intercellular communication and metabolic alteration on the NK cell exhaustion in our follow-up study. Excess copper is released into the circulation in some circumstances in WD patients, but generally they receive long-term de-coppering therapy to maintain intracellular copper at a non-lethal level. Thus, we do not tend to consider copper as a critical factor in this study. In original manuscript, we mentioned the cuproptosis and its potential as a novel target. It is likely to lead to ambiguity and misunderstanding, so we deleted this part to put our point of view clearly.

      (5) Supplementary Figures: The presentation and labeling of these figures need to be changed.

      Thank you for your suggestions. We have modified the figures and confirmed that all characters on the figures are visible.

      Reviewer #2 (Recommendations For The Authors):

      It is better to test whether ATP7B mutation can directly affect immune functions.

      Thank you for your suggestions. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication. Thus, we prioritize the effect of intercellular communication on the NK cell exhaustion and we are actively promoting the research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer 1

      We would like to express our gratitude to Reviewer 1 for providing a thorough summary of our work and highlighting its strengths. With regards to the weaknesses, we are committed to improve the manuscript by performing the necessary changes. First, we will specify the exact p-value in all cases.

      Regarding the discussion section, we acknowledge the feedback regarding its potential confusion. In line with the reviewer's suggestion, we will reduce the literature review and highlight our findings.

      Finally, for the preprint we did not include cofounders such as HIV infection and ethnicity as our study population did not exhibit viral infections and comprised only Hispanic individuals. We will make a more thorough description of the population of study and address these characteristics explicitly in both the methods section and the initial part of the results.

      Reviewer 2

      We appreciate and thank reviewer 2 for the commentaries. Although it is true that several papers have described the role of microbiome in COVID-19 severity, we firmly believe that our current work stands out. There is not much information related to this association in Mediterranean countries, especially in the south of Spain. In addition, most of the studies only describe microbiota composition in stool or nasopharyngeal samples separately, without investigating any potential relationships between them as we do.

      (1) We agree with the reviewer idea of a limited sample size. We faced the challenge of collecting the samples during the peak of COVID-19 pandemia. Thus, doctors and nurses were overwhelmed and not always available for carrying out patient recruitment following the inclusion criteria. Despite these constraints, we ensured that all included samples met our specified inclusion criteria and were from subjects with confirmed symptomatology.

      In addition, our main goal was to identify whether severity of the disease could be assessed through microbiota composition. Therefore we did not include a healthy group. Despite not having a large N, our results should be reproducible as they are supported by statistical analysis.

      (2) We thank reviewer commentary, and since our original sentence may have lacked clarity, we intend to modify it to ensure it conveys the intended meaning more effectively.

      Nonetheless, we remain confident in the significance of our findings. Not only have we found correlation between microbiota and COVID severity, but we have also described how specific bacteria from each condition is associated with key biochemical parameters of clinical COVID infection.

      (3) We appreciate the feedback provided by the reviewer. In this case, we have performed 16S analysis due to its cost-effectiveness compared to metagenomic approaches. Furthermore, 16S analysis has undergone refinements that ensure comprehensive coverage and depth, along with standardized analysis protocols. Unlike 16S, metagenomic approaches lack software tools such as QIIME that facilitate standardization of analysis and, thus, reduce reproducibility of results.

      (4) We sincerely appreciate this insightful suggestion. simply listing associations between both microbiomes and COVID-19 severity could not be enough, we intend to discuss how microbiota composition may be linked to the mechanisms underlying COVID-19 pathogenesis in our discussion.

      (5) We are grateful for the constructive criticism and intend to rewrite our abstract to enhance clarity. Additionally, we will thoroughly review all figures and their descriptions to ensure accuracy and comprehensibility.

      Reviewer 3

      We acknowledge the annotations made by reviewer 3 and are committed to addressing all identified weaknesses to enhance the quality of our work. Our idea is to modify the methods section and figures to make them easier to understand.

      Specifically, in the case of Figure 1, we recognize an error in the description of the Bray-Curtis test. We appreciate the commentary and we will make the necessary changes. Moreover, there is another observation related to Figure 1 description. We are going to modify it in order to gain accuracy.

      For figure 2 we are planning to add a supplementary table showing the abundance of detected genus. Nevermind, we will also update the manuscript text to provide clarification on how we obtained this result.

      Regarding the clarification about "1% abundance," we want to emphasize that we are referring to relative abundance, where 1 represents 100%. To avoid confusion, we will explicitly state this in both the methods section and figure descriptions. Besides, it is true that the statistical test employed for the analysis is not mentioned in the figure description and we recognize that the image may be difficult to interpret. Therefore, we will modify the text and a supplementary table displaying the abundance and p values is going to be added.

      Furthermore, we agree with the reviewer's suggestion to investigate whether the bacteria identified as potential biomarkers for each condition are specific to their respective severity index or if there is a threshold. Thus, we will reanalyze the data and include a supplementary table with the abundance of each biomarker for each condition. We will also place greater emphasis on these results in our discussion.

      Finally, in response to the reviewer's suggestion, we are going to go through the nasopharyngeal-fecal axis part in the discussion. It is well described that COVID-19 induces a dysbiosis in both microbiomes. Consequently, we understand that the ratio we have described could be an interesting tool for assessing COVID severity development as it considers alterations in both environments. However, we acknowledge that there may be room for improvement in clarifying the significance of this intriguing finding and its implications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript from Schwintek and coworkers describes a system in which gas flow across a small channel (10^-4-10^-3 m scale) enables the accumulation of reactants and convective flow. The authors go on to show that this can be used to perform PCR as a model of prebiotic replication.

      Strengths:

      The manuscript nicely extends the authors' prior work in thermophoresis and convection to gas flows. The demonstration of nucleic acid replication is an exciting one, and an enzyme-catalyzed proof-of-concept is a great first step towards a novel geochemical scenario for prebiotic replication reactions and other prebiotic chemistry.

      The manuscript nicely combines theory and experiment, which generally agree well with one another, and it convincingly shows that accumulation can be achieved with gas flows and that it can also be utilized in the same system for what one hopes is a precursor to a model prebiotic reaction. This continues efforts from Braun and Mast over the last 10-15 years extending a phenomenon that was appreciated by physicists and perhaps underappreciated in prebiotic chemistry to increasingly chemically relevant systems and, here, a pilot experiment with a simple biochemical system as a prebiotic model.

      I think this is exciting work and will be of broad interest to the prebiotic chemistry community.

      Weaknesses:

      The manuscript states: "The micro scale gas-water evaporation interface consisted of a 1.5 mm wide and 250 µm thick channel that carried an upward pure water flow of 4 nl/s ≈ 10 µm/s perpendicular to an air flow of about 250 ml/min ≈ 10 m/s." This was a bit confusing on first read because Figure 2 appears to show a larger channel - based on the scale bar, it appears to be about 2 mm across on the short axis and 5 mm across on the long axis. From reading the methods, one understands the thickness is associated with the Teflon, but the 1.5 mm dimension is still a bit confusing (and what is the dimension in the long axis?) It is a little hard to tell which portion (perhaps all?) of the image is the channel. This is because discontinuities are present on the left and right sides of the experimental panels (consistent with the image showing material beyond the channel), but not the simulated panels. Based on the authors' description of the apparatus (sapphire/CNC machined Teflon/sapphire) it sounds like the geometry is well-known to them. Clarifying what is going on here (and perhaps supplying the source images for the machined Teflon) would be helpful.

      We understand. We will update the figures to better show dimensions of the experimental chamber. We will also add a more complete Figure in the supplementary information. Part of the complexity of the chamber however stems from the fact that the same chamber design has also been used to create defined temperature gradients which are not necessary and thus the chamber is much more complex than necessary.

      We added the scheme of the whole PTFE Chip to Figure 2 in the top left corner, indicating the ROI shown in the fluorescence micrographs. Additionally, the channel walls are now clearly indicated by white dotted lines. The dimensions of the setup are now shown clearer, by showing the total width of the channel as well as its height until the gas flux channel, as well as its depth. Changed caption of the figure accordingly and it now reads: “[…] The PTFE chip cutout in the top left corner shows the ROI used for the micrographs. The color scale is equal for both simulation and experiment and Channel dimensions are 4 x 1.5 x 0.25 mm as indicated. Dotted lines visualize the location of the channel walls. […]“

      The data shown in Figure 2d nicely shows nonrandom residuals (for experimental values vs. simulated) that are most pronounced at t~12 m and t~40-60m. It seems like this is (1) because some symmetry-breaking occurs that isn't accounted for by the model, and perhaps (2) because of the fact that these data are n=1. I think discussing what's going on with (1) would greatly improve the paper, and performing additional replicates to address (2) would be very informative and enhance the paper. Perhaps the negative and positive residuals would change sign in some, but not all, additional replicates?

      To address this, we will show two more replicates of the experiment and include them in Figure 2.

      We are seeing two effects when we compare fluorescence measurements of the experiments.

      Firstly, degassing of water causes the formation of air-bubbles, which are then transported upwards to the interface, disrupting fluorescence measurements. This, however, mostly occurs in experiments with elevated temperatures for PCR reactions, such as displayed in Figure 4.

      Secondly, due to the high surface tension of water, the interface is quite flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, leading to alterations in the circular flow fields below.

      Thus the conditions, while overall being in steady state, show some fluctuations. The strong dependence on interface shape is also seen in the simulation. However, modeling a dynamic interface shape is not so easy to accomplish, so we had to stick to one geometry setting. Again here, the added movies of two more experiments should clarify this issue.

      We performed three more replicates of the experiment and included the averaged data points together with their respective standard deviation as error bars in Figure 2d. Additionally, the videos of each individual repeat are now added to the supplementary files for the reader to better understand where the strong fluctuations around half an hour come from. The Figure caption was adjusted to “ […] The maximum relative concentration of DNA increased within an hour to ~30 X the initial concentration, with the trend following the simulation. Error bars are the standard deviation from four independent measurements. […].

      The main text was also changed to better explain how the fluctuations impact the measurements: […] Water continuously evaporated at the interface, but nucleic acids remained in the aqueous phase accumulating near the interface. They could only escape downward either by diffusion or by the vortex induced by the gas flowing across the interface, pushing the molecules back deeper into the bulk (See the flow lines in Fig2(b) taken from the simulation).  As the gas flow continuously removed excess vapor, the evaporation rate remained constant. Thus, except for fluctuations, a stable interface shape should be expected. However, due to the high surface tension of water, the interface is very flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, likely in response to small fluctuations in gas pressure and spatial variations in water surface tension. This is leading to alterations in the circular flow fields below (Supplementary Movie 2).

      As these fluctuations are difficult to simulate, we decided to stick with one interface shape, matching evaporation and inflow speeds. The evaporation rate at the interface was therefore set to be proportional to the vapor concentration gradient and varied spatially along the interface between 5 and 10.5 µm/s (See Suppl. Fig. VI.1(d)). Using the known diffusion coefficient of 95 µm²/s for the 63mer[9]}, the simulation closely matched the experimental results. In both cases, DNA accumulated in regions with circular flow patterns driven by the gas flux (Fig.2(b), right panel).

      5 minutes after starting the experiment, the maximum DNA accumulation was 3-fold, while after one hour of evaporation, around 30-fold accumulation was observed. Due to molecules residing in very shallow volumes when directly at the interface, the fluorescence signal can vary drastically compared to measurements deeper in the bulk. This can be seen in the fluctuations between independent measurements (See Supplementary Movies 2b,2b,2c), especially around 0.5~h shown in Figure 2(d). The simulated maximum accumulation followed the experimental results and starts saturating after about one hour (Fig.2(d)). […]”

      The authors will most likely be familiar with the work of Victor Ugaz and colleagues, in which they demonstrated Rayleigh-Bénard-driven PCR in convection cells (10.1126/science.298.5594.793, 10.1002/anie.200700306). Not including some discussion of this work is an unfortunate oversight, and addressing it would significantly improve the manuscript and provide some valuable context to readers. Something of particular interest would be their observation that wide circular cells gave chaotic temperature profiles relative to narrow ones and that these improved PCR amplification (10.1002/anie.201004217). I think contextualizing the results shown here in light of this paper would be helpful.

      Thanks for pointing this out and reminding us. We apologize. We agree that the chaotic trajectories within Rayleigh-Bénard convection cells lead to temperature oscillations similar to the salt variations in our gas-flux system. Although the convection-driven PCR in Rayleigh-Bénard is not isothermal like our system, it provides a useful point of comparison and context for understanding environments that can support full replication cycles. We will add a section comparing approaches and giving some comparison into the history of convective PCR and how these relate to the new isothermal implementation.

      We added a main text paragraph after the last paragraph in section “Strand Separation Dynamics”: “[…]Rayleigh-Bénard convection cells generate similar patterns to those seen in Fig. 3(c) The oscillations in salt concentration resemble the temperature fluctuations observed in convection-based PCR reactions from earlier studies [32,33], which showed that chaotic temperature variations, compared to periodic ones, enhanced the efficiency of the PCR reaction.[…]

      Again, it appears n=1 is shown for Figure 4a-c - the source of the title claim of the paper - and showing some replicates and perhaps discussing them in the context of prior work would enhance the manuscript.

      We appreciate the reviewer for bringing this to our attention. We will now include the two additional repeats for the data shown in Figure 4c, while the repeats of the PAGE measurements are already displayed in Supplementary Fig. IX.2. Initially, we chose not to show the repeats in Figure 4c due to the dynamic and variable nature of the system. These variations are primarily caused by differences at the water-air interface, attributed to the high surface tension of water. Additionally, the stochastic formation of air bubbles in the inflow—despite our best efforts to avoid them—led to fluctuations in the fluorescence measurements across experiments. These bubbles cause a significant drop in fluorescence in a region of interest (ROI) until the area is refilled with the sample.

      Unlike our RNA-focused experiments, PCR requires high temperatures and degassing a PCR master mix effectively is challenging in this context. While we believe our chamber design is sufficiently gas-tight to prevent air from diffusing in, the high surface-to-volume ratio in microfluidics makes degassing highly effective, particularly at elevated temperatures. We anticipate that switching to RNA experiments at lower temperatures will mitigate this issue, which is also relevant in a prebiotic context.

      The reviewer’s comments are valid and prompt us to fully display these aspects of the system. We will now include these repeats in Figure 4c to give readers a deeper understanding of the experiment's dynamics. Additionally, we will provide videos of all three repeats, allowing readers to better grasp the nature of the fluctuations in SYBR Green fluorescence depicted in Figure 4c.

      The data from the triplicates are now added to Figure 4c, showing how air bubbles, forming through degassing at the high temperatures required for Taq polymerase, disrupt the measurement, as they momentarily dry off the channel and stop the reaction until the channel fills again. Figure caption has been adapted and now reads: “[…] Dotted lines show the data from independent repeats. Air bubbles formed through degassing can momentarily disrupt the reaction. […]”

      We additionally changed the main text to explain the reader the experimental difficulties: “[…] In other repetitions of the reaction, this increase was sometimes even observed earlier, around the one-hour mark (dotted lines). However, air bubbles nucleated by degassing events rise and temporarily dry out the channel, interrupting the reaction until the liquid refills the channel (Supplementary Movies 4,4b,4c\&5). Despite our best efforts, we were unable to fully prevent this, especially given the high temperatures required for Taq polymerase activity. In an identical setting when the gas- and water flux were switched off, no fluorescence increase was found (See Fig. 4(c) red lines). Fluorescence variations are additionally caused by fluctuations in the position of the gas-water interface, as discussed earlier. […]”

      I think some caution is warranted in interpreting the PCR results because a primer-dimer would be of essentially the same length as the product. It appears as though the experiment has worked as described, but it's very difficult to be certain of this given this limitation. Doing the PCR with a significantly longer amplicon would be ideal, or alternately discussing this possible limitation would be helpful to the readers in managing expectations.

      This is a good point and should be discussed more in the manuscript. Our gel electrophoresis is capable of distinguishing between replicate and primer dimers. We know this since we were optimizing the primers and template sequences to minimize primer dimers, making it distinguishable from the desired 61mer product. That said, all of the experiments performed without a template strand added did not show any band in the vicinity of the product band after 4h of reaction, in contrast to the experiments with template, presenting a strong argument against the presence of primer dimers.

      We added a main text section explaining this to the reader: “[…]Suppl. Fig. IX.2 shows all independent repeats of the corresponding experiments. No product was detected in any of these cases, ruling out reaction limitations such as primer dimer formation. Primer dimers would form even in the absence of a template strand and would be identifiable through gel electrophoresis. As Taq polymerase requires a significant overlap between the two dimers to bind, this would result in a shorter product compared to the 61mer used here.  […]”

      Reviewer #2 (Public review):

      Schwintek et al. investigated whether a geological setting of a rock pore with water inflow on one end and gas passing over the opening of the pore on the other end could create a non-equilibrium system that sustains nucleic acid reactions under mild conditions. The evaporation of water as the gas passes over it concentrates the solutes at the boundary of evaporation, while the gas flux induces momentum transfer that creates currents in the water that push the concentrated molecules back into the bulk solution. This leads to the creation of steady-state regions of differential salt and macromolecule concentrations that can be used to manipulate nucleic acids. First, the authors showed that fluorescent bead behavior in this system closely matched their fluid dynamic simulations. With that validation in hand, the authors next showed that fluorescently labeled DNA behaved according to their theory as well. Using these insights, the authors performed a FRET experiment that clearly demonstrated the hybridization of two DNA strands as they passed through the high Mg++ concentration zone, and, conversely, the dissociation of the strands as they passed through the low Mg++ concentration zone. This isothermal hybridization and dissociation of DNA strands allowed the authors to perform an isothermal DNA amplification using a DNA polymerase enzyme. Crucially, the isothermal DNA amplification required the presence of the gas flux and could not be recapitulated using a system that was at equilibrium. These experiments advance our understanding of the geological settings that could support nucleic acid reactions that were key to the origin of life.

      The presented data compellingly supports the conclusions made by the authors. To increase the relevance of the work for the origin of life field, the following experiments are suggested:

      (1) While the central premise of this work is that RNA degradation presents a risk for strand separation strategies relying on elevated temperatures, all of the work is performed using DNA as the nucleic acid model. I understand the convenience of using DNA, especially in the latter replication experiment, but I think that at least the FRET experiments could be performed using RNA instead of DNA.

      We understand the request only partially. The modification brought about by the two dye molecules in the FRET probe to be able to probe salt concentrations by melting is of course much larger than the change of the backbone from RNA to DNA. This was the reason why we rather used the much more stable DNA construct which is also manufactured at a lower cost and in much higher purity also with the modifications. But we think the melting temperature characteristics of RNA and DNA in this range is enough known that we can use DNA instead of RNA for probing the salt concentration in our flow cycling.

      Only at extreme conditions of pH and salt, RNA degradation through transesterification, especially under alkaline conditions is at least several orders of magnitude faster than spontaneous degradative mechanisms acting upon DNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. The work presented in this article is however focussed on hybridization dynamics of nucleic acids. Here, RNA and DNA share similar properties regarding the formation of double strands and their respective melting temperatures. While RNA has been shown to form more stable duplex structures exhibiting higher melting temperatures compared to DNA [Dimitrov, R. A., & Zuker, M. (2004). Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 87(1), 215-226.], the general impact of changes in salt, temperature and pH [Mariani, A., Bonfio, C., Johnson, C. M., & Sutherland, J. D. (2018). pH-Driven RNA strand separation under prebiotically plausible conditions. Biochemistry, 57(45), 6382-6386.] on respective melting temperatures follows the same trend for both nucleic acid types. Also the diffusive properties of RNA and DNA are very similar [Baaske, P., Weinert, F. M., Duhr, S., Lemke, K. H., Russell, M. J., & Braun, D. (2007). Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proceedings of the National Academy of Sciences, 104(22), 9346-9351.].

      Since this work is a proof of principle for the discussed environment being able to host nucleic acid replication, we aimed to avoid second order effects such as degradation by hydrolysis by using DNA as a proxy polymer. This enabled us to focus on the physical effects of the environment on local salt and nucleic acid concentration. The experiments performed with FRET are used to visualize local salt concentration changes and their impact on the melting temperature of dissolved nucleic acids.  While performing these experiments with RNA would without doubt cover a broader application within the field of origin of life, we aimed at a step-by-step / proof of principle approach, especially since the environmental phenomena studied here have not been previously investigated in the OOL context. Incorporating RNA-related complexity into this system should however be addressed in future studies. This will likely require modifications to the experimental boundary conditions, such as adjusting pH, temperature, and salt concentration, to account for the greater duplex stability of RNA. For instance, lowering the pH would reduce the RNA melting temperature [Ianeselli, A., Atienza, M., Kudella, P. W., Gerland, U., Mast, C. B., & Braun, D. (2022). Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA. Nature Physics, 18(5), 579-585.].

      (2) Additionally, showing that RNA does not degrade under the conditions employed by the authors (I am particularly worried about the high Mg++ zones created by the flux) would further strengthen the already very strong and compelling work.

      Based on literature values for hydrolysis rates of RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.], we estimate RNA to have a half-life of multiple months under the deployed conditions in the FRET experiment (High concentration zones contain <1mM of Mg2+). Additionally, dsRNA is multiple orders of magnitude more stable than ssRNA with regards to degradation through hydrolysis [Zhang, K., Hodge, J., Chatterjee, A., Moon, T. S., & Parker, K. M. (2021). Duplex structure of double-stranded RNA provides stability against hydrolysis relative to single-stranded RNA. Environmental Science & Technology, 55(12), 8045-8053.], improving RNA stability especially in zones of high FRET signal. Furthermore, at the neutral pH deployed in this work, RNA does not readily degrade. In previous work from our lab [Salditt, A., Karr, L., Salibi, E., Le Vay, K., Braun, D., & Mutschler, H. (2023). Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nature Communications, 14(1), 1495.], we showed that the lifetime of RNA under conditions reaching 40mM Mg2+ at the air-water interface at 45°C was sufficient to support ribozymatically mediated ligation reactions in experiments lasting multiple hours.

      With that in mind, gaining insight into the median Mg2+ concentration across multiple averaged nucleic acid trajectories in our system (see Fig. 3c&d) and numerically convoluting this with hydrolysis dynamics from literature would be highly valuable. We anticipate that longer residence times in trajectories distant from the interface will improve RNA stability compared to a system with uniformly high Mg2+ concentrations.

      Added a new Supplementary section for this. We used the trace from Figure 3(c) and calculated the hydrolysis rate for each timestep by using literature values from RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. We conclude that the conditions deployed for the experiment are not harsh on RNA, with hydrolysis rates in the E-6 1/min regime. The figure below (also now in the supplementary information) shows the hydrolysis of RNA deployed under the conditions of the experiment in Figure 3. RNA is not expected to hydrolyze under these conditions and timescales, in which a replication reaction would occur. With a half life of around 83 days, even a prebiotically plausible – very slow – replication reaction would not be constrained by hydrolysis boundary conditions in this scenario.

      Referenced to this section in the supplementary information in the maintext: […] In the experimental conditions used here, RNA would also not readily degrade, even if the strand enters the high salt regimes (See Suppl. Sec. IX). Using literature values for hydrolysis rates under the deployed conditions, we estimate dissolved RNA to have a half life of around 83 days. […]

      (3) Finally, I am curious whether the authors have considered designing a simulation or experiment that uses the imidazole- or 2′,3′-cyclic phosphate-activated ribonucleotides. For instance, a fully paired RNA duplex and a fluorescently-labeled primer could be incubated in the presence of activated ribonucleotides +/- flux and subsequently analyzed by gel electrophoresis to determine how much primer extension has occurred. The reason for this suggestion is that, due to the slow kinetics of chemical primer extension, the reannealing of the fully complementary strands as they pass through the high Mg++ zone, which is required for primer extension, may outcompete the primer extension reaction. In the case of the DNA polymerase, the enzymatic catalysis likely outcompetes the reannealing, but this may not recapitulate the uncatalyzed chemical reaction.

      This is certainly on our to-do list for future experiments in this setting. Our current focus is on templated ligation rather than templated polymerization and we are working hard to implement RNA-only enzyme-free ligation chain reaction, based on more optimized parameters for the templated ligation from 2’3’-cyclic phosphate activation that was just published [High-Fidelity RNA Copying via 2′,3′-Cyclic Phosphate Ligation, Adriana C. Serrão, Sreekar Wunnava, Avinash V. Dass, Lennard Ufer, Philipp Schwintek, Christof B. Mast, and Dieter Braun, JACS doi.org/10.1021/jacs.3c10813 (2024)]. But we first would try this at an air-water interface which was shown to work with RNA in a temperature gradient [Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment, Annalena Salditt, Leonie Karr, Elia Salibi, Kristian Le Vay, Dieter Braun & Hannes Mutschler, Nature Communications doi.org/10.1038/s41467-023-37206-4 (2023)] before making the jump to the isothermal setting we describe here. So we can understand the question, but it was good practice also in the past to first get to know the setting with PCR, then jump to RNA.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Could the authors comment on the likelihood of the geological environments where the water inflow velocity equals the evaporation velocity?

      This is an important point to mention in the manuscript, thank you for pointing that out. To produce a defined experiment, we were pushing the water out with a syringe pump, but regulated in a way that the evaporation was matching our flow rate. We imagine that a real system will self-regulate the inflow of the water column on the one hand side by a more complex geometry of the gas flow, matching the evaporation with the reflow of water automatically. The interface would either recede or move closer to the gas flux, depending on whether the inflow exceeds or falls short of the evaporation rate. As the interface moves closer, evaporation speeds up, while moving away slows it down. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface in place.

      We have seen a bit of this dynamic already in the experiments, could however so far not yet find a good geometry within our 2-dimensional constant thickness geometry to make it work for a longer time. Very likely having a 3-dimensional reservoir of water with less frictional forces would be able to do this, but this would require a full redesign of a multi-thickness microfluidics. The more we think about it, the more we envisage to make the next implementation of the experiment with a real porous volcanic rock inside a humidity chamber that simulates a full 6h prebiotic day. But then we would lose the whole reproducibility of the experiment, but likely gain a way that recondensation of water by dew in a cold morning is refilling the water reservoirs in the rocks again. Sorry that I am regressing towards experiments in the future.

      We added a paragraph after the second paragraph in Results and Discussion.

      It now reads: […] For a real early Earth environment we envision a system that self-regulates the water column's inflow by automatically balancing evaporation with capillary flows. The interface adjusts its position relative to the gas flux, moving closer if the inflow is less than the evaporation rate, or receding if it exceeds it. When the interface nears the gas flux, evaporation accelerates, while moving it away slows evaporation. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface's position. […]

      (2) Could the authors speculate on using gases other than ambient air to provide the flux and possibly even chemical energy? For example, using carbonyl sulfide or vaporized methyl isocyanide could drive amino acid and nucleotide activation, respectively, at the gas-water interface.

      This is an interesting prospect for future work with this system. We thought also about introducing ammonia for pH control and possible reactions. We were amazed in the past that having CO2 instead of air had a profound impact on the replication and the strand separation [Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA, Alan Ianeselli, Miguel Atienza, Patrick Kudella, Ulrich Gerland, Christof Mast & Dieter Braun, Nature Physics doi.org/10.1038/s41567-022-01516-z (2022)]. So going more in this direction absolutely makes sense and as it acts mostly on the length-selectively accumulated molecules at the interface, only the selected molecules will be affected, which adds to the selection pressure of early evolutionary scenarios.

      Of course, in the manuscript, we use ambient air as a proxy for any gas, focusing primarily on the energy introduced through momentum transfer and evaporation. We speculate that soluble gasses could establish chemical gradients, such as pH or redox potential, from the bulk solution to the interface, similar to the Mg2+ accumulation shown in Figure 3c. The nature of these gradients would depend on each gas's solubility and diffusivity. We have already observed such effects in thermal gradients [Keil, L. M., Möller, F. M., Kieß, M., Kudella, P. W., & Mast, C. B. (2017). Proton gradients and pH oscillations emerge from heat flow at the microscale. Nature communications, 8(1), 1897.] and finding similar behavior in an isothermal environment would be a significant discovery.

      Added a paragraph in the Conclusion to showcase this: [… ] Furthermore we expect that other gases, such as CO2, could establish chemical gradients in this environment. Such gradients have been observed in thermal gradients before [23] and finding similar behaviour in an isothermal environment would be a significant discovery.[…]

      (3) Line 162: Instead of "risk," I suggest using "rate".

      Thanks for pointing this out! Will be changed.

      Fixed.

      (4) Using FRET of a DNA duplex as an indicator of salt concentration is a decent proxy, but a more direct measurement of salt concentration would provide further merit to the explicit statement that it is the salt concentration that is changing in the system and not another hidden parameter.

      Directly observing salt concentration using microscopy is a difficult task. While there are dyes that change their fluorescence depending on the local Na+ or Mg2+ concentration, they are not operating differentially, i.e. by making a ratio between two color channels. Only then we are not running into artifacts from the dye molecules being accumulated by the non-equilibrium settings. We were able to do this for pH in the past, but did not find comparable optical salt sensors. This is the reason we ended up with a FRET pair, with the advantage that we actually probe the strand separation that we are interested in anyhow. Using such a dye in future work would however without a doubt enhance the understanding of not only this system, but also our thermal gradient environments.

      (5) Figure 3a: Could the authors add information on "Dried DNA" to the caption? I am assuming this is the DNA that dried off on the sides of the vessel but cannot be sure.

      Thanks to the reviewer for pointing this out. This is correct and we will describe this better in the revised manuscript.

      Added a sentence in the caption to address this: […] Fluctuations in interface position can dry and redissolve DNA repeatedly (see “Dried DNA” in right panel). […]

      (6) Figure 4b and c: How reproducible is this data? Have the authors performed this reaction multiple independent times? If so, this data should be added to the manuscript.

      The data from the gel electrophoresis was performed in triplicates and is shown in full in supplementary information. The data in c is hard to reproduce, as the interface is not static and thus ROI measurements are difficult to perform as an average of repeats. Including the data from the independent repeats will however give the reader insight into some of the experimental difficulties, such as air bubbles, which form from degassing as the liquid heats up, that travel upwards to the interface, disrupting the ongoing fluorescence measurements.

      This was also pointed out by reviewer 1 and addressed there.

      (7) Line 256: "shielding from harmful UV" statement only applies to RNA oligomers as UV light may actually be beneficial for earlier steps during ribonucleoside synthesis. I suggest rephrasing to "shielding nucleic acid oligomers from UV damage.".

      Will be adjusted as mentioned.

      Fixed.

      (8) The final paragraph in the Results and Discussion section would flow better if placed in the Conclusion section.

      This is a good point and we will merge results and discussion closer together.

      Fixed.

      (9) Line 262, "...of early Life" is slightly overstating the conclusions of the study. I suggest rephrasing to "...of nucleic acids that could have supported early life."

      This is a fair comment. We thank the reviewer for his detailed analysis of the manuscript!

      Changed the phrase to: […]In this work we investigated a prebiotically plausible and abundant geological environment to support the replication of nucleic acids. […]

      (10) In references, some of the journal names are in sentence case while others are in title case (see references 23 and 26 for example).

      Thanks - this will be fixed.

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides compelling evidence that RAR, rather than its obligate dimerization partner RXR, is functionally limiting for chromatin binding. This manuscript provides a paradigm for how to dissect the complicated regulatory networks formed by dimerizing transcription factor families.

      Dahal and colleagues use advanced SMT techniques to revisit the role of RXR in DNA-binding of the type-2 nuclear receptor (T2NR) RAR. The dominant consensus model for regulated DNA binding of T2NRs posits that they compete for a limited pool of RXR to form an obligate T2NR-RXR dimer. Using advanced SMT and proximity-assisted photoactivation technologies, Dahal et al. now test the effect of manipulating the endogenous pool size of RAR and RXR on heterodimerization and DNA-binding in live U2OS cells. Surprisingly, it turns out that RAR, rather than RXR, is functionally limiting for heterodimerization and chromatin binding. By inference, the relative pool size of various T2NRs expressed in a given cell, rather than RXR, is likely to determine chromatin binding and transcriptional output.

      The conclusions of this study are well supported by the experimental results and provide unexpected novel insights into the functioning of the clinically important class of T2NR TFs. Moreover, the presented results show how the use of novel technologies can put long-standing theories on how transcription factors work upside down. This manuscript provides a paradigm for how to further dissect the complicated regulatory networks formed by T2NRs or other dimerizing TFs. I found this to be a complete story that does not require additional experimental work. However, I do have some suggestions for the authors to consider.

      Reviewer #1 (Recommendations For The Authors):

      (1) Does the increased chromatin binding measured when the RAR levels are increased reflect a higher occupancy of a similar set of loci, or are additional loci bound? The authors could discuss this issue in the context of the published literature. Obviously, this could be addressed experimentally by ChIP-seq or a similar analysis, but this would extend beyond the main topic of this manuscript.

      We attempted to explore this experimentally using ChIP-seq with multiple RAR- and RXR-specific antibodies. Unfortunately, our results were inconclusive, as the antibody enrichment relative to the IgG control was insufficient for reliable interpretation. Specifically, our ChIP-seq enrichment levels were only around 1.5fold, while the accepted standard for meaningful ChIP enrichment is typically at least 2-fold. Due to these technical limitations, we decided to defer these experiments for now.

      However, we agree with the reviewer that understanding whether the increased chromatin binding of RAR reflects higher occupancy at the same set of loci or binding to additional loci is a key question. In similar experiments involving the transcription factor TFEB (Esbin et al., 2024, Genes Dev, doi: 10.1101/gad.351633.124) where an increase in the SMT bound fraction occurred, both scenarios—higher occupancy at known loci and binding to additional loci in ChIP-seq was observed. So, addressing this intriguing possibility in future studies focused on RAR and RXR would be interesting.

      (2) The results presented suggest convincingly that endogenous RXR is normally in excess to its binding partners (in U2OS cells). This point could be strengthened further by reducing RXR levels, e.g., by knocking out 1 allele or the use of shRNAs (although the latter method might be too hard to control). Overexpression of another T2NR might also help determine the buffer capacity of RXR.

      We appreciate the reviewers’ acknowledgment that our results convincingly demonstrate that endogenous RXR is typically in excess relative to its binding partners in U2OS cells. We agree that this conclusion could be further reinforced by experiments such as overexpression of another T2NR to test RXR's buffering capacity. We are actively pursuing follow-up experiments involving overexpression of additional T2NRs to address this question in more detail. These studies are ongoing, and we plan to explore the buffer capacity of RXR more extensively in a future manuscript.

      (3) The ~10% difference in fbound of RAR and RXR (in Figs 1 and 2), while they should be 1:1 dimers, is explained by invoking the expression of RXR isoforms. Can the authors be more specific concerning the nature of these isoforms?

      We have provided detailed information about different T2NRs expressed in U2OS cells according to the Expression Atlas and the Human Protein Atlas Database in Supplementary Table S1. Table S1 specifically shows that both isoforms of RXRα and RXRβ are expressed in U2OS cells. Additionally, the caption of Table S1 explicitly notes the presence of isoform RXRβ in U2OS cells. In the main text, we reference Table S1 when discussing the 10% difference in fbound between RARα and RXRα, and we have now suggested that the expression of RXRβ likely accounts for the observed discrepancy.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Surprising Features of Nuclear Receptor Interaction Networks Revealed by Live Cell Single Molecule Imaging", Dahal et al combine fast single molecule tracking (SMT) with proximity-assisted photoactivation (PAPA) to study the interaction between RARa and RXRa. The prevalent model in the nuclear receptor field suggests that type II nuclear receptors compete for a limiting pool of their partner RXRa. Contrary to this, the authors find that over-expression of RARa but not RXRa increases the fraction of RXRa molecules bound to chromatin, which leads them to conclude that the limiting factor is the abundance of RARa and not RXRa. The authors also perform experiments with a known RARa agonist, all trans retinoic acid (atRA) which has little effect on the bound fraction. Using PAPA, they show that chromatin binding increases upon dimerization of RARa and RXRa.

      Strengths:

      In my view, the biggest strength of this study is the use of endogenously tagged RARa and RXRa cell lines. As the authors point out, most previous studies used either in vitro assays or over-expression. I commend the authors on the generation of single-cell clones of knock-in RARa-Halo and Halo-RXRa. The authors then carefully measure the abundance of each protein using FACS, which is very helpful when comparing across conditions. The manuscript is generally well written and figures are easy to follow. The consistent color-scheme used throughout the manuscript is very helpful.

      Weaknesses:

      (1) Agonist treatment:

      The authors test the effect of all trans retinoic acid (atRA) on the bound fraction of RARa and RXRa and find that "These results are consistent with the classic model in which dimerization and chromatin binding of T2NRs are ligand independent." However, all the agonist treatments are done in media containing FBS. FBS is not chemically defined and has been found to have between 10 and 50 nM atRA (see references in PMID 32359651 for example). The addition of 1 nM or 100 nM atRA is unlikely to result in a strong effect since the medium already contains comparable or higher levels of agonist. To test their hypothesis of ligand-independent dimerization, the authors should deplete the media of atRA by growing the cells in a medium containing charcoal-stripped FBS for at least 24 hours before adding agonist.

      We acknowledge the reviewer's concern regarding the presence of atRA in FBS and agree that it may introduce baseline levels of agonist. However, in our experiments, both the 1 nM and 100 nM atRA treatments resulted in observable changes in RAR expression levels (Figure S3C). Additionally, the luciferase assays demonstrated that 100 nM atRA significantly increased retinoic acid-responsive promoter activity (Figure S1C). Given these clear responses to atRA, we believe the observed lack of effect on the chromatin-bound fraction cannot be attributed to the presence of comparable or higher levels of atRA in the FBS, as the reviewer suggests. Moreover, since our results align with the established literature and do not impact the core findings of our study, we decided not to pursue the suggested experiments with charcoal-stripped FBS in this manuscript.  

      (2) Photobleaching and its effect on bound fraction measurements:

      The authors discard the first 500 to 1000 frames due to the high localization density in the initial frames. This will preferentially discard bound molecules that will bleach in the initial frames of the movie and lead to an over-estimation of the unbound fraction.

      For experiments with over-expression of RAR-Halo and Halo-RXR, the authors state that the cells were pre-bleached and that these frames were used to calculate the mean intensity of the nuclei. When pre-bleaching, bound molecules will preferentially bleach before the diffusing population. This will again lead to an over-representation of the unbound fraction since this is the population that will remain relatively unaffected by the pre-bleaching. Indeed, the bound fraction for over-expressed RARa and RXRa is significantly lower than that for the corresponding knock in lines. To confirm whether this is a biological result, I suggest that the authors either reduce the amount of dye they use so that this pre-bleaching is not necessary or use the direct reactivation strategy they use for their PAPA experiments to eliminate the pre-bleaching step.

      As for the measurement of the nuclear intensity, since the authors have access to multiple HaloTag dyes, they can saturate the HaloTagged proteins with a high concentration of JF646 or JFX650 to measure the mean intensity of the protein while still using the PA-JFX549 for SMT. Together, these will eliminate the need to prebleach or discard any frames.

      The Janelia Fluor dyes used in our experiments are known for their high photostability (Grimm et al., 2021, JACS Au, doi: 10.1021/jacsau.1c00006). During the initial 80 ms imaging to calculate the mean nuclear intensity, the laser power was kept at very low intensity (~3%) for a brief duration (~10 seconds), in contrast to the high-intensity (~100%) used during the tracking experiments, which span around 3 minutes. This low-power illumination does not induce significant photobleaching but merely puts the dyes in a temporary dark state. Therefore, this pre-bleaching step closely resembles the direct reactivation strategy employed in our PAPA experiments.

      To further address the reviewer's concern, we performed a frame cut-off analysis for our SMT movies of endogenous RARα-Halo and over-expressed RARα-Halo (Figure S9B). The analysis shows no significant change in the bound fraction of either endogenous or over-expressed RARα-Halo when discarding the initial 1000 frames. Based on these results, we conclude that the pre-bleaching does not lead to an overestimation of the unbound fraction, and that our experimental approach is robust.

      (3) Heterogeneous expression of the SNAP fusion proteins:

      The cell lines expressing SNAP tagged transgenes shown in Fig S6 have very heterogeneous expression of the SNAP proteins. While the bulk measurements done by Western blotting are useful, while doing single-cell experiments (especially with small numbers - ~20 - of cells), it is important to control for expression levels. Since these transgenic stable lines were not FACS sorted, it would be helpful for the reader to know the spread in the distribution of mean intensities of the SNAP proteins for the cells that the SMT data are presented for. This step is crucial while claiming the absence of an effect upon over-expression and can easily be done with a SNAPTag ligand such as SF650 using the procedure outlined for the over-expressed HaloTag proteins.

      We agree with the reviewer that there is heterogeneity in SNAP protein expression across the transgenic lines. In response to the reviewer’s suggestion, we performed the proposed experiment to assess the distribution of mean intensities for two key experimental conditions: Halo-RXRα with overexpressed RARα-SNAP and HaloRXRα with overexpressed RARαRR-SNAP. These results again confirm that the increase in chromatin-bound fraction of Halo-RXRα is observed only in the presence of RARα capable of heterodimerizing with RXRα, supporting our main conclusion (Figure S9).

      For these experiments, we followed the same labelling procedure described in the methods section for tracking endogenous Halo-tagged proteins alongside transgenic SNAP proteins. As shown in Figure S9, for ~ 70 cell nuclei, the distribution of mean intensities is similar for both conditions, with the bound fraction of Halo-RXRα significantly increasing in the presence of RARα-SNAP compared to RARαRR-SNAP. This analysis underscores that the observed effects are indeed due to the functional differences between the two RARα variants rather than variability in expression levels.

      (4) Definition of bound molecules:

      The authors state that molecules with a diffusion coefficient less than 0.15 um2/s are considered bound and those between 1-15 um2/s are considered unbound. Clarification is needed on how this threshold was determined. In previous publications using saSPT, the authors have used a cutoff of 0.1 um2/s (for example, PMID 36066004, 36322456). Do the results rely on a specific cutoff? A diffusion coefficient by itself is only a useful measure of normal diffusion. Bound molecules are unlikely to be undergoing Brownian motion, but the state array method implemented here does not seem to account for non-normal diffusive modes. How valid is this assumption here?

      We acknowledge the inconsistency in the diffusion coefficient thresholds for defining the chromatin-bound fraction used across our group’s publications. The choice of threshold or cutoff (0.1 µm²/s vs 0.15 µm²/s) is largely arbitrary and does not significantly impact the results. To validate this, we tested the effect of different cutoffs on fbound (%) for endogenously expressed Halo-tagged RARα and RXRα (Figure S10). As shown in Figure S10, there was no substantial difference in fbound (%) calculated using a 0.1 µm²/s versus 0.15 µm²/s cutoff (e.g., RARα clone c156: 47±1% vs 49±1%; RXRα clone D6: 34±1% vs 35±1%). 

      Since we have consistently applied the 0.15 µm²/s cutoff throughout this manuscript across all experimental conditions, the comparative analysis of fbound (%) remains valid. While we agree that a Brownian diffusion model may not fully capture the motion of bound molecules, our state array model accounts for localization error, which likely incorporates some of the chromatin motion features. Moreover, the distinction between bound (<0.15 µm²/s) and unbound (1-15 µm²/s) populations is sufficiently large that using a normal diffusion model is reasonable for our analysis.

      (5) Movies:

      Since this is an imaging manuscript, I request the authors to provide representative movies for all the presented conditions. This is an essential component for a reader to evaluate the data and for them to benchmark their own images if they are to try to reproduce these findings.

      We have now included representative movies for all the SMT experimental conditions presented in the manuscript. Please see data availability section of the manuscript.

      (6) Definition of an ROI:

      The authors state that "ROI of random size but with maximum possible area was selected to fit into the interior of the nuclei" while imaging. However, the readout speed of the Andor iXon Ultra 897 depends on the size of the defined ROI. If the ROI was variable for every movie, how do the authors ensure the same sampling rate?

      We used the frame transfer mode on the Andor iXon Ultra 897 camera for our acquisitions, which allows for fast frame rate measurements without altering the exposure time between frames. Additionally, we verified the metadata of all our movies to ensure a consistent frame interval of 7.4 ms across all conditions. This confirms that the sampling rate was maintained uniformly, despite the variability in ROI size. 

      Reviewer #2 (Recommendations For The Authors):

      (1) 'Hoechst' is mis-spelled.

      We have now corrected this typo in the manuscript.

      (2) Cos7 appears in several places throughout the text. I assume this is a typo. If so, please correct it. If not, please explain if some experiments were done in Cos7 cells and kindly provide a justification for that.

      The use of Cos7 cells is intentional and not a typo. Cos7 cells have been previously utilized in studies investigating the interaction between T2NRs (Kliewer et al., 1992, Nature, doi: 10.1038/355446a0). In our study, due to technical issues with antibodies for coIP in U2OS cells, we initially used Cos7 cells for control experiments to verify that Halo-tagging of RARα and RXRα did not disrupt their interaction, by transiently expressing the constructs in Cos7 cells. Following these control experiments, we confirmed the direct interaction of endogenously expressed RAR and RXR in U2OS cells with their respective binding partners using the SMT-PAPA assay. Since these results confirmed that Halo-tagging did not interfere with RAR-RXR interactions, we chose not to repeat the coIP experiments in U2OS cells.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to investigate the stoichiometric effect between core factors and partners forming the heterodimeric transcription factor network in living cells at endogenous expression levels. Using state-of-the-art single-molecule analysis techniques, the authors tracked individual RARα and RXRα molecules labeled by HALO-tag knock-in. They discovered an asymmetric response to the overexpression of counter-partners. Specifically, the fact that an increase in RARα did not lead to an increase in RXRα chromatin binding is incompatible with the previous competitive core model. Furthermore, by using a technique that visualizes only molecules proximal to partners, they directly linked transcription factor heterodimerization to chromatin binding.

      Strengths:

      The carefully designed experiments, from knock-in cell constructions to singlemolecule imaging analysis, strengthen the evidence of the stoichiometric perturbation response of endogenous proteins. The novel finding that RXR, previously thought to be a target of competition among partners, is in excess provides new insight into key factors in dimerization network regulation. By combining the cutting-edge single-molecule imaging analysis with the technique for detecting interactions developed by the authors' group, they have directly illustrated the relationship between the physical interactions of dimeric transcription factors and chromatin binding. This has enabled interaction analysis in live cells that was challenging in single-molecule imaging, proving it is a powerful tool for studying endogenous proteins.

      Weaknesses:

      As the authors have mentioned, they have not investigated the effects of other T2NRs or RXR isoforms. These invisible factors leave room for interpretation regarding the origin of chromatin binding of endogenous proteins (Recommendations 4). In the PAPA experiments, overexpressed factors are visualized, but changes in chromatin binding of endogenous proteins due to interactions with the overexpressed proteins have not been investigated. This might be tested by reversing the fluorescent ligands for the Sender and Receiver. Additionally, the PAPA experiments are likely to be strengthened by control experiments (Recommendations 5).

      We agree that this would be an interesting experiment. However, there are three technical challenges that complicate its implementation: First, as demonstrated in our original PAPA paper, dark state formation is less efficient when dyes are conjugated to Halo compared to SNAPf, making the reverse configuration less optimal. Second, SNAPf-tagged proteins have slower labeling kinetics than Halotagged proteins, often resulting in under-labeling of SNAPf. Third, our SNAPf transgenes were integrated polyclonally. Since background PAPA scales with the concentration of the sender-labeled protein, variable concentrations of the senderlabeled SNAPf proteins would introduce significant variability, complicating the interpretation of the background PAPA signal. Due to these concerns, we believe that performing reciprocal measurements with reversed fluorescent ligands may not yield reliable results. 

      Reviewer #3 (Recommendations For The Authors):

      (1) The term "Surprising features" in the title is ambiguous and may force readers to search for what it specifically refers to. Including a word that evokes specific features might be helpful.

      Our findings contradict previous work, which suggested that chromatin binding of T2NRs is regulated by competition for a limited pool of RXR. In contrast, we found that RAR expression can limit RXR chromatin binding, but not the other way around, which challenges the existing model. This unexpected result is what we refer to as a "surprising feature" in our title, and we believe it accurately reflects the novel insights our study provides. We also think that this is clearly conveyed in our manuscript abstract, supporting the use of "Surprising features" in the title. 

      (2) p.3, line 11 - The threshold of 0.15 μm2s-1 seems to be a crucial value directly linked to the value of fbound. What is the rationale for choosing this specific value? If consistent conclusions can be obtained using threshold values that are similar but different, it would strengthen the robustness of the results.

      Please refer to our response to Reviewer #2’s Public Review point 4. The threshold choice is arbitrary and doesn’t affect the overall conclusions. To test this, we compared fbound (%) values calculated using both 0.1 μm²s-1 and 0.15 μm²s-1 cutoffs. For example, with endogenously expressed Halo-tagged RARα (clone c156), we observed fbound values of 47±1% vs 49±1%, and for RXRα (clone D6), 34±1% vs 35±1%, respectively (Figure S10). Since we have consistently applied the 0.15 μm²s-1 cutoff across all experimental conditions in this manuscript, the comparisons of fbound (%) between different conditions are robust and valid.

      (3) p.4, line 13 - "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of SNAP (47{plus minus}1%)" part of the sentence is not surprising. It would make more sense if it were expressed as "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of RXRα-SNAP (49{plus minus}1%), consistent with the control SNAP (47{plus minus}1%).".

      We understand how the original phrasing may be confusing to the readers and have restructured the sentence as suggested by the reviewer for clarity.

      (4) p.6, line 26 - The discussion that "most chromatin binding of endogenous RXRα in U2OS cells depends on heterodimerization partners other than RARα" seems to contradict the top right figure in Figure 4. If that's the case, the binding partner for the bound red molecule might be yellow rather than blue. Given a decrease in the number of RARα molecules with an unchanged binding ratio, the total number of binding molecules has decreased. Could it be interpreted that the potential reduction in RXRα chromatin binding, accompanying the decrease in binding RARα, is compensated for by other partners?

      We agree with the reviewer that both the yellow and blue molecules in Figure 4 represent T2NRs that can heterodimerize with RXR. For simplicity, we chose to omit the depiction of RXR dimerization with other T2NRs (represented in yellow) in Figure 4. We have now included a note in the figure caption to clarify this. We plan to follow up on the buffer capacity of RXR with other T2NRs in a separate manuscript and will discuss this aspect in more detail once we have data from those experiments.

      (5) Fig. 3 - I expected that DR localizations always appear more frequently than PAPA localizations by the difference in the number of distal molecules. Why does the linear line for SNAP-RXRα in Fig. 3 B have a slope exceeding 1? Also, although the sublinearity is attributed to binding saturation, is there any possibility that this sublinearity originates from the PAPA system like the saturation of PAPA reactivation? Control samples like Halo-SNAPf-3xNLS might address these concerns.

      The number of DR and PAPA localizations depends on the arbitrarily chosen intensity and duration of green and violet light pulses. For any given protein pair, different experimental settings can result in PAPA localizations being greater than, less than, or equal to the number of DR localizations. Therefore, the informative metric is not the absolute number of DR and PAPA localizations, but rather how the ratio of PAPA to DR localizations changes between different conditions—such as between interacting pairs and non-interacting controls.

      Regarding the sublinearity, we agree that it is essential to consider whether the observed sublinearity might stem from saturation of the PAPA signal. We know of two ways in which this could occur:

      First, PAPA can be saturated as the duration of the green light pulse increases and dark-state complexes are depleted. However, this cannot explain the nonlinearity that we observe, because the duration of the green light pulse is constant, and thus the probability that a given complex is reactivated by PAPA is also constant. Likewise, holding the violet pulse duration constant yields a constant probability that a given molecule is reactivated by DR. PAPA localizations are expected to scale linearly with the number of complexes, while DR localizations are expected to scale linearly with the total number of molecules. Sublinear scaling of PAPA localizations with DR localizations thus implies that the number of complexes scales sublinearly with the total concentration of the protein.

      Second, saturation could occur if PAPA localizations are undercounted compared to DR localizations. While this is a valid concern, we consider it unlikely in this case because 1) our localization density is below the level at which our tracking algorithm typically undercounts localizations, and 2) we observe sublinearity for RXR → RAR PAPA even though the number of PAPA localizations is lower than the DR localizations; undercounting due to excessive localization density would be expected to introduce the opposite bias in this case.

      (6) Fig. 4 - The differences between A, B, and C on the right side of the model are subtle, making it difficult to discern where to see. Emphasizing the difference in molecule numbers or grouping free molecules at the top might help clarify these distinctions.

      We appreciate the reviewer’s feedback. In response, we have revised Figure 4 by grouping the free molecules on the top right side for panels A, B and C, as suggested.

      (7) While the main results are obtained through single-molecule imaging, no singlemolecule fluorescence images or trajectory plots are provided. Even just for representative conditions, these could serve as a guide for readers trying to reproduce the experiments with different custom-build microscope setups. Also, considering data availability, depositing the source data might be necessary, at least for the diffusion spectra.

      We have now included representative movies for all the presented SMT conditions as source data. Please see data availability section of the manuscript.

      (8) Tick lines are not visible on many of the graph axes. 

      We have revised the figures to ensure that the tick lines are now clearly visible on all graph axes.

      (9) Inconsistencies in the formatting are present in the methods, such as "hrs" vs. "hours", spacing between numbers and units, and "MgCl2". "u" should be "μ" and "x" should be "×". 

      We have corrected the formatting errors.

      (10) Table S4, rows 16 and 17 - Are "RAR"s typos for "RXR"s? 

      We have corrected this in the manuscript.

      (11) p.10~12 - Are three "Hoestch"s typos for "Hoechst"s? 

      This is now corrected in the manuscript.

      (12) p.11, line 17 - According to the referenced paper, the abbreviation should be "HILO" in all capital letters, not "HiLO". 

      This is now corrected in the manuscript.

      (13) "%" on p.3, line 18, and "." on p.6, line 27 are missing. 

      This missing “%”  and “.” are now added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yao S. and colleagues aims to monitor the potential autosomal regulatory role of the master regulator of X chromosome inactivation, the Xist long non-coding RNA. It has recently become apparent that in the human system, Xist RNA can not only spread in cis on the future inactive X chromosome but also reach some autosomal regions where it recruits transcriptional repression and Polycomb marking. Previous work has also reported that Xist RNA can show a diffused signal in some biological contexts in FISH experiments.

      In this study, the authors investigate whether Xist represses autosomal loci in differentiating female mouse embryonic stem cells (ESCs) and somatic mouse embryonic fibroblasts (MEFs). They perform a time course of ESC differentiation followed by Capture Hybridization of Associated RNA Targets (CHART) on both female and male ESCs, as well as pulldowns with sense oligos for Xist. The authors also examine transcriptional activity through RNA-seq and integrate this data with prior ChIP-seq experiments. Additional experiments were conducted in MEFs and Xist-ΔB repeat mutants, the latter fails to recruit Polycomb repressors.

      Based on this experimental design, the authors make several bold claims:

      (1) Xist binds to about a hundred specific autosomal regions.

      (2) This binding is specific to promoter regions rather than broad spreading.

      (3) Xist autosomal signal is inversely correlated with PRC1/2 marks but positively correlated with transcription.

      (4) Xist targeting results in the attenuation of transcription at autosomal regions.

      (5) The B-repeat region is important for autosomal Xist binding and gene repression.

      (6) Xist binding to autosomal regions also occurs in somatic cells but does not lead to gene repression.

      Together, these claims suggest that Xist might play a role in modulating the expression of autosomal genes in specific developmental and cellular contexts in mice.

      Strengths:

      This paper deals with an interesting hypothesis that Xist ncRNA can also function at autosomal loci.

      Weaknesses: The claims reported in this paper are largely unsubstantiated by the data, with multiple misinterpretations, lacking controls, and inadequate statistics. Fundamental flaws in the experimental design/analysis preclude the validity of the findings. Major concerns are listed below: (1) The entire paper is based on the CHART observation that Xist is specifically targeted to autosomal promoters. Overall, the data analysis is flawed and does not support such conclusions. Importantly the sense WT and the 0h controls are not used, nor are the biological replicates. 

      We respectfully disagree with Rev1 but nevertheless thank the reviewer for making some suggestions that helped to strengthen our manuscript.  We have provided new experiments and analyses in the revised manuscript. Please see responses below.

      Rev1 seems to have missed or misunderstood some key experiments. In fact, the sense WT and 0h controls were shown. Furthermore, we included at least two biological replicates for each experiment.

      We used both male ES cells (which do not express Xist) and sense probes as key negative controls, as outlined in Figure S1. Crucially, we only analyzed peaks that were reproducible between biological replicates. The Xist CHART peaks in differentiating female ES cells were significantly enriched above the “background” defined by the sense probe and male controls. Specifically, in comparison to undifferentiated female ES cells (day 0) where both X chromosomes are active and Xist is not induced, Xist CHART robustly pulled down the X chromosome during cell differentiation (day 4, day 7, and day 14). In contrast, male ES cells showed no significant pull-down of the X chromosome, and the sense group also exhibited markedly reduced binding (new Figure S1B). Furthermore, Principal Component Analysis (PCA) of CHART-seq reads (day 4 as an example) include Xist, sense, and input in WT and ΔRepB female, further confirmed that the sense probe CHART was clearly distinguishable from Xist CHART signals. Please see revised Figure S1C. Together, these findings underscore the specificity and robustness of our CHART results.

      Data is typically visualized without quantification, and when quantified, control loci/gene sets are erroneously selected. Firstly, CHART validation on the X in FigS1 is misleading and not based on any quantifications (e.g., see the scale on Kdm6a (0-190) compared to Cdkl5 (0-40)). If scaled appropriately, there is Xist signal on the escapee. 

      Rev1 may have misread the presented data. In the example raised by Rev1, Fig. S1 is inherently quantitative: e.g., a ratio is a number in Fig. S1A (now Fig. S1B) and all gene tracks in Fig. 1B-E are shown with scales. We showed X-linked genes in Fig. S1 (now Fig. S2) as a control to demonstrate that the CHART worked and that Xist accumulated over time from day 0 to day 14. Our new Figure 1B demonstrates the Xist accumulation in graph format. 

      Our paper focuses on Xist autosomal binding sites. Thus, the X-linked examples were placed in the supplement. Escapee genes do in fact accumulate Xist at their promoter regions and this finding is consistent with data published by Simon et al. (2013, Nature). It was therefore not desirable in this paper to reanalyze X-linked genes, including escapees. Nevertheless, to address the reviewer’s concerns, we present new data in new Figure S3A. Here we analyzed the density of Xist binding across X-linked genes, including both active and inactive genes, as well as escapee genes. From this quantitative analysis, it should be clear that escapees do bind Xist. However, from the metagene plots in Figure S3B, we confirm the previous conclusion that escapees bind Xist at high levels just upstream of the promoter and that there is a depletion of Xist in the escapee gene body, consistent with a barrier preventing Xist from moving into the active gene. 

      All X-linked loci should have been quantified and classified based on escape status; sense control should also be quantified, and biological replicates should be shown separately. 

      Please see above response.

      Additionally, in the revised manuscript, we have examined the Irreproducible Discovery Rate (IDR) to validate the reproducibility of peaks between the two replicates in the revised version, and we included a representative example from female WT ES cells at day 4 (revised Figure S4A). The results showed a strong correlation between the replicates, with an IDR threshold of 0.05 (red point > 0.05). As described in the Methods section, to ensure reliable and robust peak identification, we performed peak calling (MACS2) separately on each replicate, and then used bedtools intersect to identify peaks that overlapped between the two replicates. This stringent process, including strict q-value settings in MACS2, ensures the reliability and reproducibility of the peaks presented in this study.

      Secondly, and most importantly, Figure 1 does not convincingly show specific Xist autosomal binding. Panel A quantification is on extremely variable y-scales and actually shows that Xist is recruited globally to nearly all autosomal genes, likely indicating an unspecific signal. Again, the sense and 0h controls should have been quantified along with biological replicates. 

      Figure 1 shows heatmaps and corresponding metagenes for d0, d4, d7, and d14 female ES cells. Two biological replicates are analyzed. In our revised manuscript, we have used Pearson and Spearman correlation coefficients to measure the strength and direction of a relationship between two biological replicates and shown that the two replicates have high reproducibility (new Figure S1A). On d0, the Xist coverage on autosomes and X chromosome is low, but there is a clear increase on d4, d7, and d14, particularly at the TSS of autosomal genes, as shown by the metagene plots on in Figure 1A-B and the CHART density maps in new Figure 1E-F. We also show relative depletion of Xist signals in the male and sense negative controls.

      Upon inspecting genome browser tracks of all regions reported in the manuscript (Rbm14, Srp9, Brf1, Cand2, Thra, Kmt2c, Kmt2e, Stau2, and Bcl7b), the signal is unspecific on all sites with the possible exception of Kmt2e. On all other loci, there is either a strong signal in the 0h ESC controls or more signal in some of the sense controls. This implies that peak calling is picking up false positive regions. How many peaks would have been picked up if the sense or the 0h controls were used for peak calling? It is likely that there would be a lot since there are also possible "peaks" (e.g., Fzd9) in control tracks. 

      The analysis cannot be performed by visual inspection. A statistical analysis must be performed to call signal above noise. This is why we performed peak-calling on two biological replicates and identified overlapping peaks using bedtools intersect to improve reliability. Significant peaks are noted as black bars under each track. As mentioned above, for our analysis, we focused on the top 100 peaks based on peak scores to ensure robustness. Xist has significantly higher signal compared to the sense probe in the Xist-autosomal peak regions (revised Figure 1E-F). Additionally, we conducted peak calling on undifferentiated ES cells (d0) and detected a significantly higher number of peaks (~600) compared to the differentiated states (d4 or d7) (~100).

      Single-cell sequencing studies have shown that about 2% of undifferentiated mESCs express detectable Xist (Pacini et al., Nat Commun, 2021). The Xist peaks in “day 0” cells may be due to the differentiating population.

      Further inspection of the data was not possible as the authors did not provide access to the raw fastq files. When inspecting results from past published experiments {Engreitz, 2013 #1839} reported regions were not bound by Xist. 

      On the contrary, we deposited the raw data files to GEO prior to the submission of the paper and included the reviewer link to access them. As of August 24, 2024, GEO publicly released these files, allowing for full inspection of the data. 

      Regarding the Engreitz publication, it is not recommended to compare our current study to their analysis for the crucial reason that the Engreitz study was not conducted under physiological conditions. The authors overexpressed the Xist gene in male ES cells. Because Xist RNA can silence genes in male cells as well, this ectopic overexpression normally leads to cell death — thus forcing examination of effects in a narrow time window before Xist can fully spread and act across the genome. Comparing our experiments (endogenous Xist expression in female ES cells) to the ectopic overexpression in male ES cells of Engreitz et al. should therefore not be undertaken.

      Thirdly, contrary to the authors' claim, deleting the B repeat does not lead to a loss of autosomal signal. Indeed, comparing Fig1A and Fig2B side by side clearly shows no difference in the autosomal signal, likely because the autosomal signal is CHART background. Properly quantifying the signal with separate replicates as well as the sense and 0h controls is vital. Overall current data together with published results indicate that CHART peak calling on autosomes is due to technical noise or artefacts.

      In our revised manuscript, we have included the quantitative results as mentioned above in the main and supplementary figure (new Figure 1E-F, Figure 2E-F, and S3A). The data clearly show an enrichment in the Xist CHART samples in differentiating female ES cells.

      We believe the reviewer may be comparing the original Figure 1A and Figure 2A (not Figure 2B). As mentioned above, the analysis cannot be performed by visual inspection. Please see new Figure 2E and 2F. From these data, it should be clear that deleting RepB causes a decrease in Xist targeting to autosomal loci.

      (2) The RNA-seq analysis is also flawed and precludes strong statements. Firstly, the analysis frequently lacks statistical analysis (Fig3B, FigS2B-C) and is often based on visualizations (Fig 3D-G) without quantifications. Day 4 B-repeat deletion does not lead to a significant change in the expression of genes close to Xist signal (Fig3H, d14 does not fully show). 

      Please see new revised Figure 3B and Figures S2B-C (now revised as Figures S6A and S6B). 

      Secondly, for all transcriptional analysis, it is important to show autosomal non-target genes, which is not always done. 

      In the revised manuscript, we included non-target genes for each analysis (new Figure 4E-F, 5D and 5F, 7C and 7E, S7F, S8).

      Indeed, both males and B repeat deletion will lead to transcriptional changes on autosomes as a secondary effect from different X inactivation status. The control set, if used, is inappropriate as it compares one randomly selected set of ~100 genes. This introduces sampling error and compares different classes of genes. Since Xist signal targets more active genes, it is important to always compare autosomal target genes to all other autosomal genes with similar basal expression patterns.

      Please see new Figure S8. We included 100 randomly selected non-target sites on autosomes for this comparative analysis. For consistency, we applied the same flanking regions (10 kb) in the analysis of both target and non-target genes. We believe that this selection method for nontargets is appropriate for two reasons: first, it allows us to control for Xist binding and non-binding; second, it ensures a similar number of genes in both groups, providing a robust foundation for statistical analysis. 

      (3) The ChIP-seq analysis also has some problems. The authors claim that there is no positive correlation between genes close to Xist autosomal binding (10kb) compared to those 50kb away (Fig 3C, S2D); however, this analysis is based entirely on metagene visualization. Signal within the Xist binding sites should be quantified (not genes close by) and compared to other types of genomic loci and promoters. Focusing on the 50kb group only as controls is misleading.

      We believe the reviewer may have misunderstood our conclusions. As stated in the paper, we observed lower coverage of the histone marks H3K27me3 and H2AK119ub, associated with PRC2 and PRC1, respectively. Our conclusions regarding PRC1/2 support the RNA-seq results, indicating that Xist tends to bind to actively expressed genes. In other words, these genes exhibit lower levels of PRC-mediated silencing signals. This observation underscores the relationship between Xist binding and gene activity, highlighting that Xist preferentially associates with regions that are less subject to silencing by polycomb repressive complexes.

      Secondly, the authors only look at PRC mark signal upon differentiation; what about the 0h timepoint, i.e., is there pre-marking? 

      Day 0 is not an appropriate timepoint for this analysis because Xist is not yet induced. There is also a small fraction of cells (<5%) that spontaneously differentiate and start to undergo XCI. Because of these reasons, the day 0 timepoint is considered somewhat heterogeneous and it would be difficult to make conclusions regarding Xist peaks in these samples.

      Most worryingly, the data analysis is not consistent between figures (see Fig3C vs 5H-I). In Fig5, the group of Xist targets was chosen as those within 100kb of Xist binding, which would encompass all the control regions from Fig3C. In this analysis, the authors report that there is Xist-dependent H3K27me3 deposition, and in fact, here the Xist autosomal targets have more of it than the controls. Overall, all of this analysis is misleading, and clear conclusions cannot be made.

      We believe that the reviewer may have also misunderstood the analysis in Figure 5. Figure 5 shows the effect of the Xist inhibitor, X1, on H3K27me3 and gene expression. X1 blocks reduces PRC2 targeting and gene silencing — consistent with X1’s effect on RepA as published in Aguilar et al. 2022. 

      All in all, because the fundamental observation is not robust (see point 1), all subsequent analyses are also affected. There are also multiple other inconsistencies within the analysis; however, they have not been included here for brevity.

      We again respectfully disagree with Rev1 but thank the reviewer for making suggestions that helped to strengthen our manuscript.  We believe that the revised manuscript with new analyses is improved in part because of the reviewer’s critical comments.

      Reviewer #2 (Public review):

      Summary:

      To follow-up on recent reports of Xist-autosome interaction the authors examine female (and male transgenic) mESCs and MEFs by CHARTseq. Upon finding that only 10% of reads map to X, they sought to identify reproducible alternative sites of Xist-binding, and identify ~100 autosomal Xistbinding sites and show a transient impact on expression.

      Strengths:

      The authors address a topical and interesting question with a series of models including developmental timepoints and utilize unbiased approaches (CHARTseq, RNAseq). For the CHARTseq they have controls of both sense probes and male cells; and indeed do detect considerable background with their controls. The use of deletions emphasizes that intact functional Xist is involved. The use of 'metagene' plots provides a visual summation of genic impact.

      Reviewer 2 has made some excellent suggestions. We have revised the manuscript accordingly and are grateful to the reviewer for the recommendations.

      Weaknesses:

      Overall, the result presentation has many 'sample' gene presentations (in contrast to the stronger 'metagene' summation of all genes). The manuscript often relies on discussion of prior X chromosomal studies, while the data generated would allow assessment of the X within this study to confirm concordance with prior results using the current methodology/cell lines. 

      Many of the 'follow-up' analyses are in fact reprocessing and comparison of published datasets. The figure legends are limited, and sample size and/or source of control is not always clear. While similar numbers of autosomal Xist-binding sites were often observed, the presented data did not clarify how many were consistent across time-points/cell types. While there were multiple time points/lines assessed, only 2 replicates were generally done.

      We apologize for the deficiencies in the legend.  The revised manuscript has corrected them.

      We generated many new datasets with deep sequencing, with at least two biological replicates for each. Such experiments are extremely expensive by nature. Thus, two biological replicates are typically considered acceptable.

      Additionally, we performed reanalysis of published datasets to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Figure 4 is a case in point. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Rbm14 and Bcl7b (new Figure 4C, S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (new Figure 4E and 4G). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (new Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. 

      Aim achievement:

      The authors do identify autosomal sites with enrichment of chromatin marks and evidence of silencing. More details regarding sample size and controls (both treatment, and most importantly choice of 'non-targets' - discussed in comments to authors) are required to determine if the results support the conclusions.

      Specific scenarios for which I am concerned about the strength of evidence underlying the conclusion:

      I found the conclusion "Thus, RepB is required not only for Xist to localize to the X- chromosome but also for its localization to the ~100 autosomal genes " (p5) in constrast to the statement 2 lines prior: "A similar number of Xist peaks across autosomes in ΔRepB cells was observed and the autosomal targets remained similar". Some quantitative statistics would assist in determining impact, both on autosomes and also X; perhaps similar to the quintile analysis done for expression.

      We have added the Xist coverage panel for day 4 and 7 in the identified Xist-autosomal peak regions (new Figure 1E-F, Figure 2E-F), as mentioned above. The results clearly demonstrate that the deletion of RepB decreases Xist binding to autosomes. Also, we showed that ΔRepB increased X-linked genes expression in our revised Figure 3D. 

      It is stated that there is a significant suppression of X-linked genes with the autosomal transgenes; however, only an example is shown in Figure 4B. To support this statement, a full X chromosomal geneset should be shown in panels F and G, which should also list the number of replicates. 

      Please see new Figure 4B.

      As these are hybrid cells, perhaps allelic suppression could be monitored? Is Med14 usually subject to X inactivation in the Ctrl cells, and is the expression reduced from both X chromosomes or preferentially the active (or inactive) X chromosome?

      If Rev2 is referring to Figure 4, the dataset used in Figure 4 comes from another research group and was previously published (Loda, A. et al. Nat Commun, 2017).

      If Rev2 is referring to our ES cells, they are N2 cell lines.  The X chromosomes are fully hybridized (Cas/Mus), but the autosomes are not fully hybridized (Ogawa et al., Science, 2008). Med14 is subject to XCI and is expressed from the Xa, silenced on the Xi. 

      The expression change for autosomes after transgene induction is barely significant; and it was not clear what was used as the Ctrl? This is a critical comparator as doxycycline alone can change expression patterns.

      We agree that there was a modest change in expression after transgene induction, but it is a significant change. Again, the dataset is from a published study where the authors generated doxycycline-responsive Xist transgenes (see above). The control in this case is Dox-treated wildtype cells. We now clarify these points.

      In the discussion there is the statement. "Genetic analysis coupled to transcriptomic analysis showed that Xist down-regulates the target autosomal genes without silencing them. This effect leads to clear sex difference - where female cells express the ~100 or so autosomal genes at a lower level than male cells (Figure 7H)." This sweeping statement fails to include that in MEFs there is no significant expression difference, in transgenics only borderline significance, and at d14 no significant expression difference. The down-regulation overall seems to be transient during development while targeting is ongoing?

      Indeed, the Xist effects on autosomes seem to occur during cell differentiation in ES cells. While there is no apparent effect in MEFs, we cannot exclude effects on other somatic cells. Regardless of whether the effects are in early development or throughout life, the sex differences may have life-long effects in mammals. The study conducted in human cells by the Plath lab also concluded that the differences primarily affect stem cells.

      Finally, I would have liked to see discussion of the consistency of the identified genes to support the conclusion that the autosomal sites are not merely the results of Xist diffusion.

      We address this in the third paragraph of the Discussion. Our main argument is that if autosomal binding were caused by diffusion, then RepB deletion or X1 treatment would have led to increased binding at autosomal sites, as Xist would bind less to the X chromosome. However, as demonstrated in our study, both treatments resulted in reduced Xist binding on both the X chromosome and autosomes. This finding suggests that the binding is specific and reliant on Xist's RepA and RepB domains, rather than being a passive diffusion process.

      To examine overlap between the conditions (days of differentiation and WT/RepB cells), we generated Venn Diagrams as now shown in Figure S4E.

      The impact of Xist on autosomes is important for consideration of impact of changes in Xist expression with disease (notably cancers). Knowing the targets (if consistent) would enable assessment of such impact.

      We thank Rev2 for the very helpful review and for the forward-looking experiments. Indeed, the physiological changes brought on by autosomal targeting will be of future interest.

      Reviewer #3 (Public review):

      Summary:

      Yao et al use CHART to identify chromatin associated with Xist in female mouse ESCs, and, as control, male ESCs at various timepoints of differentiation. Besides binding of Xist to X chromosome regions they found significant binding to autosomes, concentrating mostly on promoter regions of around 100 autosomal genes, as elucidated by MACS. The authors went on to show that the RepB repeat is mostly responsible for these autosomal interactions using a female ESC line in which RepB is deleted. Evidence is provided that Xist interacts with active autosomal genes containing lower coverage of repressive marks H3K27me3 and H2AK119ub and that RepB dependent Xist binding leads to dampening of expression, but not silencing of autosomal genes. These results were confirmed by overexpression studies using transgenic ESCs with doxycycline-inducible Xist as well as via a small molecule inhibitor of Xist (X1), inducing/inhibiting the dampening of autosomal genes, respectively. Finally, using MEFs and Xist mutants RepB or RepE the authors provide evidence that Xist is bound to autosomal genes in cells after the XCI process but appears not to affect gene expression. The data presented appear generally clear and consistent and indicate some differences between human and mouse autosomal regulation by Xist. Thus, these results are timely and should be published.

      We thank Rev3 for the positive remarks and great suggestions.  We have amended the manuscript per below. 

      Strengths:

      Regulation of autosomal gene expression by Xist is a "big deal" as misregulation of this lncRNA causes developmental defects and human disease. Moreover, this finding may explain sexspecific developmental differences between the sexes. The results in this manuscript identify specific mouse autosomal genes bound by Xist and decipher critical Xist regions that mediate this binding and gene dampening. The methods used in this study are appropriate, and the overall data presented appear convincing and are consistent, indicating some differences between human and mouse autosomal regulation by Xist.

      Weaknesses:

      (1) The figure legends and/or descriptions of data are often very short lacking detail, and this unnecessarily impedes the reading of the manuscript, in particular the figures would benefit not only from more detailed descriptions/explanations of what has been done but also what is shown. 

      We have included more detailed descriptions in the figure legends and throughout the manuscript.

      This will facilitate the reading and overall comprehension by the reader. One out of many examples: In Fig S1B in the CHART data at d4 and d7 there is not only signal in female WT Xist antisense but also in female sense control. For a reader that is not an expert in XCI it would be helpful to point out in the legend that this signal corresponds to the lncRNA Tsix (I suppose), that is transcribed on the other strand.

      We thank the reviewer for this excellent point.  We have amended the Results section accordingly.

      (2) Different scales are used in the lower panels of Figures 1A and 2A, which makes it difficult to directly compare signals between the different differentiation stages.

      We have included a figure combining all timepoints — d0, d4, d7, and d14 WT female Xist CHART signals  — on the X chromosome and autosomes to support our thesis. Please see new Figure 1B.

      (3) In this study some of the findings on mouse cells contrast previously published results in human ESCs: 1) Xist binding occurs preferentially to promoters in mice, not in human. 2) Binding of Xist is mostly detected in polycomb-depleted regions in mice but there is a positive correlation between Xist RNA and PRC2 marks in human ESCs. These differences are surprising but may be very interesting and relevant. While I am aware that this might be a difficult task, it would be helpful to experimentally address this issue in order to distinguish whether species specific and/or methodological differences between the studies are responsible for these differences.

      Indeed, our findings in mouse cells contrast with those observed in humans. As discussed in the manuscript, this discrepancy may be attributed to factors such as cell type, differentiation methods, and the Xist pull-down technique employed (our CHART method utilizes a 20 nt oligo library, whereas RAP uses long oligos). We agree that future work should investigate the underlying causes of these differences between mouse and human systems.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      For Figure 2: labelling ∆B on the panel A timeline (e.g. d0-∆B) would make the results clearer for the audience. Panel B makes most sense beside panel E of Figure 1, so combine here and skip in Figure 1?

      We have modified Figure 2A and thank Rev2 for this suggestion. As for the embedded tables: since we performed peak calling for WT and ∆B separately, we believe that showing both the peak numbers and their corresponding peak patterns provides a clearer representation of the data.

      I agree that at day 7 there appears to be a difference in X; but by day 14 this looks much more minimal - is it just time-shifted rather than altered? Perhaps this could be discussed. Autosomal binding sites show no change in number.

      Day 7 exhibits the strongest Xist binding on the X chromosome, consistent with the de novo establishment phase of XCI when Xist is expressed at the highest levels (300 copies/cell during de novo XCI versus ~100 copies/cell during maintenance [Sunwoo et al., 2015 as cited]. Per our RNA-seq analysis here, we also observed highest Xist expression on day 7 and reduced levels on day 14 (Fig. S5A). This expression difference explains the reduced Xist CHART levels on day 14 compared to day 7. 

      While the X has previously been examined, it would seem beneficial to conduct the same expression analyses (Figure 3) for the X (perhaps supplemental), as the authors have the data 'in hand'. I feel comparison to X in the main figure for panels A and B would fit, while a similar analysis for the X for panel C could be supplemental, presumably supporting the published data to which this data is currently compared. 

      This is a good suggestion. Please find the new data in Figures 2E-F and 3D, which demonstrate that the RepB deletion inhibits Xist binding on the X chromosome, resulting in increased X-linked gene expression, as previously mentioned. Since Xist binds across the X chromosome, we did not perform peak calling as we did for the autosomes. Therefore, applying a similar analysis as in Figures 3A-B may not be appropriate in this case.

      Such a direct comparison to X-data from the same study would be important. For panel H: How many replicates (2)? This should be in the legend. What is the change in median expression? Again, a supplemental figure showing impact on X-linked targets would be useful. Do male and female ESCs show an expression difference prior to differentiation (ie d0)? The data underlying this Figure should be in one of the supplementary tables, showing the full statistical tests and average change. The supplementary tables 8-12 list the WT target genes, not expression differences with the deletion. Again, given that the difference appears transient, might the ∆B cells be altered in rate of differentiation?

      Panel H (revised Figure 3G) includes two replicates, and this has been added to the legends. We have provided a supplementary figure demonstrating that RepB increases the expression levels of X-linked genes on days 4, 7, and 14 (revised Figure 3D). Male and female ESCs show differences in the expression of X-linked genes, as both X chromosomes are active in females at this stage prior to differentiation (revised Figure S5C). 

      A supplementary table with statistical tests and average change information has been included in our revised version (Table S11).

      On the other hand, these Xist-autosomal target genes displayed no significant differences between WT male, female, or ∆B female cells on day 0 — prior to onset of XCI and Xist expression. Please see new Figure 3H. 

      As for whether ∆B cells are altered in their rate of differentiation, the analysis by Colognori et al. 2019 indicates that ∆B cells differentiate similarly to WT cells. (In Figure 6 of Colognori et al. 2019, autosomal genes expressed similarly in WT and ∆B cells, whereas XCI is affected only in ∆B cells)

      We have also modified the legends for our supplementary tables.

      Why were the transgene lines examined upon neuronal differentiation rather than the same approach as in Figures 1-3? I would have thought neuronal differentiation might be more similar to d14, where limited changes remain? Could the authors clarify and discuss?

      We apologize for the confusion. The Tg lines in Figure 4 came from a previously published study. We performed reanalysis of published datasets because we wanted to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Bcl7b and Rbm14 (Figure 4C and S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (Figure 4E and 4F). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. We have clarified this in the Results section.

      Figure 5 - the legend should specify the number of replicates and clarify the blue/green (intuitive, but not specified). Are the 'target' / 'non-target' genes from d4 Chart (but the RNA from d5)? How are 'non-targets' defined - do they match the 'targets' in certain criteria (expression level, chromatin features, GC content)? Do they change per differentiation protocol?

      We have modified the legends to clarify that the 'target' and 'non-target' genes are derived from the day 4 CHART-seq data, while the RNA data is from day 5, as that study sequenced day 5 and not day 4. Non-targets were randomly chosen based on (i) the absence of Xist binding and (ii) similar expression levels. Please see revised Figure S8.

      It would be helpful to compare Xist expression levels across the various models, and the MEF model could be better described - are they polyploid as often happens?

      We have included the Xist expression levels of ES cells and MEF cells in the revised version (revised Figure S5A, 6D). The transformed MEFs are indeed tetraploid, as is typical.

      For 6A to be informative, one needs to know % mapping to X in ES timeline, which is in supplemental, so perhaps 6A should also be supplemental?

      We have moved 6A to the supplemental figure.

      It is odd that ∆B seems to have had more impact in MEFs, and I would like more discussion - but I also think I am missing something: "We observed that Xist signals were more substantially reduced on both the Xi and autosomal regions in ΔRepE MEFs compared to ΔRepB cells", yet in lower panel 6 G it looks like ∆B is LOWER than ∆E? Am I misinterpreting?

      We apologize for the confusing writing.  The revised text now reads:  “To investigate, we utilized a deletion of Xist’s Repeat E (∆RepE), which was previously demonstrated to severely abrogate localization of Xist to the Xi 41,42. We reasoned that the severe loss of Xist binding might unmask a transcriptomic difference. As expected, we observed that Xist signals were somewhat more reduced on the Xi in ΔRepE MEFs compared to ΔRepB cells (Figure 6E-6F). Despite this reduction, peak coverages in autosomal target genes did not increase in ΔRepE MEFs (Figure 6E-6F). However, there was an overall decrease in the number of significant autosomal peaks in ∆RepE MEFs relative to WT cells (Figure 6A). Regardless, we observed no significant transcriptomic differences in ∆RepE MEFs relative to WT MEFs (Figure 7A-7E). Additionally, further examination of RNA sequencing data from male and female MEF cells in two published studies 43,44 corroborated that the expression levels of these autosomal Xist targets did not exhibit significant changes (Figure 7F and 7G). Altogether, the analysis in MEFs demonstrates that Xist continues to bind autosomal genes in post-XCI somatic cells. However, autosomal binding of Xist in post-XCI cells does not overtly impact expression of the associated autosomal genes. Nonetheless, we cannot exclude more subtle changes that do not meet the significance cut-off.”

      Overall, I would like to see how consistent these autosomal peaks are - I shudder to suggest Venn diagrams, but something to show whether there are day/lineage specific peaks and/or ∆repeat B/E resistant peaks. 

      We now present Venn diagrams comparing MEF, ES_d4, and ES_d7, showing approximately 50% overlap between MEF and ES cells (revised Figure S10B). This may be expected, as each timepoint is a different developmental stage of XCI, with expected gene expression differences.

      Very minor comments:

      It would be easier if the supplemental tables were tabs in 1 file!

      We will defer to the editor on how best to format the supplemental tables.

      Similar to the text, could gene names be included in the supplemental?

      We have provided gene names in the supplemental files.

      Figure 3 legend: should 'representing' be representative?

      We have modified it.

      "Xist patterns identified in human cells" p 5; it is challenging to follow human versus mouse, so specify or ensure correct use of XIST/Xist Indeed, we edited the manuscript accordingly.

      Gene names should be italicized.

      We have italicized gene names in our manuscript.

      Ref. 38 lacks details (...).

      We have updated the reference.

      Peak-like characters - perhaps characteristics? P8

      We have modified this.

      Reviewer #3 (Recommendations for the authors):

      On page 6, the 6th sentence in the first paragraph needs correction. "Consistent with Xist's behavior on the X chromosome."

      We have modified the sentence. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Longhurst et al. investigates the mechanisms of chemoresistance and chemosensitivity towards three compounds that inhibit cell cycle progression: camptothecin, colchicine, and palbociclib. Genome-wide genetic screens were conducted using the HAP1 Cas9 cell line, revealing compound-specific and shared pathways of resistance and sensitivity. The researchers then focused on novel mechanisms that confer resistance to palbociclib, identifying PRC2.1. Genetic and pharmacological disruption of PRC2.1 function, but not related PRC2.2, leads to resistance to palbociclib. The researchers then show that disruption of PRC2.1 function (for example, by MTF2 deletion), results in locus-specific changes in H3K27 methylation and increases in D-type cyclin expression. It is suggested that increased expression of D-type cyclins results in palbociclib resistance.

      Strengths:

      The results of this study are interesting and contribute insights into the molecular mechanisms of CDK4/6 inhibitors. Importantly, while CDK4/6 inhibitors are effective in the clinic, tumour recurrence is very high due to acquired resistance.

      Weaknesses:

      A key resistance mechanism is Rb loss, so it is important to understand if resistance conferred by PRC2.1 loss is mediated by Rb, and whether restoration of PRC2.1 function in Rb-deplete cells results in renewed palbociclib sensitivity. It is also important to understand the clinical implications of the results presented. The inclusion of these data would significantly improve the paper. However, besides some presentation issues and typos as described below, it is my opinion that the results are robust and of broad interest.

      Major questions:

      (1) Is the resistance to CDK4/6 inhibition conferred by mutation of MTF2 mediated by Rb?

      (2) Are mutations in PRC2.1 found in genetic analyses of tumour samples in patients with acquired resistance?

      We thank the reviewer for their editing and experimental suggestions, and have integrated their responses into our re-submitted manuscript.

      We also agree that understanding the role of RB1 in mediating palbociclib resistance to the proposed resistance mechanism is of particular interest. However, as there are three RB proteins expressed in human cells, this is a technically difficult question to probe genetically. Despite this technical challenge, we have provided multiple lines of evidence in our resubmitted manuscript that the resistance to palbociclib observed in our PRC2.1-deficent cells is mediated through the canonical CDK4/6-RB1 pathway. First, disruption of RB1 in HAP1 cells results in palbociclib resistance to a level comparable level to PRC2.1 disruption (Fig. 4E). Second, inactivation of SUZ12 or MTF2 increases the number of cells entering S-phase in palbociclib treatment (Fig. 4G) with no increase in basal rates of apoptosis (Fig. S2D), suggesting that any proliferation advantage observed in PRC2.1-defective cells is due to resistance to  palbociclib-induced cell cycle arrest. Third, we show that over expression of CCND1 and CCND2 is sufficient to drive resistance to palbociclib in wild-type HAP1 cells (Fig. S5F).  And finally, increased levels of CCND1 and CCND2 observed in cells lacking PRC2.1 activity results in higher CDK4/6 activity as measured by RB1 phosphorylation, despite palbociclib blockade (Fig. 6F). All these lines of evidence strongly suggest that MTF2-containing PRC2.1 regulates G1 progression in through the canonical CDK4/6RB1 pathway by repressing CCND1 and CCND2 expression. 

      Whether or not MTF2 deletion leads to palbociclib resistance in clinical samples is also of a question of particular interest. Currently, we are unaware of any reports that specifically mention MTF2 deletion as leading to palbociclib resistance, and we were unable to find another example in our own cancer database review. However, we have included references to other examples of MTF2 mutation resulting in chemotherapeutic resistance in our discussion. Additionally, although MTF2 is rarely observed to be mutated in cancers (Ngubo et al. 2023), it is highly differentially expressed and investigating decreased MTF2 transcription in palbociclib resistant tumors, though challenging, might prove fruitful.  However, as mechanisms of palbociclib resistance is an area of active investigation, we speculate that future studies might uncover additional examples of MTF2 mediating resistance to this clinically important chemotherapeutic.  

      Reviewer #2 (Public Review):

      Summary:

      Longhurst et al. assessed cell cycle regulators using a chemogenetic CRISPR-Cas9 screen in haploid human cell line HAP1. Besides known cell cycle regulators they identified the PRC2.1 subcomplex to be specifically involved in G1 progression, given that the absence of members of the complex makes the cells resistant to Palbociclib. They further showed that in HAP1 cells the PRC2.1, but not the PRC2.2 complex is important to repress the cyclins CCND1 and CCND2. This can explain the enhanced resistance to Palbociclib, a CDK4/6Inhibitor, after PRC2.1 deletion.

      Strengths:

      The initial CRISPR screen is very interesting because it uses three distinct chemicals that disturb the cell cycle at various stages. This screen mostly identified known cell cycle regulators, which demonstrates the validity of the approach. The results can be used as a resource for future research.

      The most interesting outcome of the experiment is the finding that knockouts of the PRC2.1 complex make the cell resistant to Palbociclib. In a further experiment, the authors focused on MTF2 and JARID2 as the main components of PRC2.1 and PRC2.2, respectively. Via extensive analyses, including genome-wide experiments, they confirmed that MTF2 is particularly important to repress the cyclins CCND1 and CCND2. The absence of MTF2 therefore leads to increased expression of these genes, sufficient to make the cell resistant to palociclib. This result will likely be of wide interest to the community.

      Weaknesses:

      The main weakness of the manuscript is that the experiments were performed in only one cell line. To draw more general conclusions, it would be essential to confirm some of the results in other cell lines.

      In addition, some of the findings, such as the results from the CRISPR screen as well as the stronger impact of the MTF2 KO on H3K27me3 and gene expression (compared to JARID2 KO), are not unexpected, given that similar results were already obtained before by other labs.

      We thank the reviewer for their suggestions and we believe that we have addressed their main concern about the generality of the MTF2 regulation of D-type cyclin expression in our resubmitted manuscript. We have now shown through shRNA knockdown that MTF2 represses CCND1 in two additional cell lines, the breast cancer MDA-MB-231 and immortalized monkey COS7 cell line (Fig. 6E). However, it is important to note that MTF2 did not control CCND1 expression in every cell line tested (Fig. 6D), underscoring the context-dependent nature of this regulation. Future studies will illuminate what cell or tumor types in which this regulation is observed.

      Additionally, while MTF2 has previously been shown to exert a greater effect on H3K27me3 levels in some circumstances (Loh et al. 2021, Rothberg et al. 2018), a number of notable reports in ES cell lines have concluded that PRC2 localization and H3K27me3 at the majority of genomic sites are dependent on both PRC2.1 and PRC2.2 activity (Healy et al. 2019, Højfeldt et al. 2019, Perino et al. 2020, Oksuz et al. 2018). Therefore, we think it is important to highlight the greater dependence on MTF2 for promoter proximal H3K27me3 levels in our transformed cell line context.  

      Reviewer #3 (Public Review):

      This study begins with a chemogenetic screen to discover previously unrecognized regulators of the cell cycle. Using a CRISPR-Cas9 library in HAP1 cells and an assay that scores cell fitness, the authors identify genes that sensitize or desensitize cells to the presence of palbociclib, colchicine, and camptothecin. These three drugs inhibit proliferation through different mechanisms, and with each treatment, expected and unexpected pathways were found to affect drug sensitivity. The authors focus the rest of the experiments and analysis on the polycomb complex PRC2, as the deletion of several of its subunits in the screen conferred palbociclib resistance. The authors find that PRC2, specifically a complex dependent on the MTF2 subunit, methylates histone 3 lysine 27 (H3K27) in promoters of genes associated with various processes including cell-cycle control. Further experiments demonstrate that Cyclin D expression increases upon loss of PRC2 subunits, providing a potential mechanism for palbociclib resistance.

      The strengths of the paper are the design and execution of the chemogenetic screen, which provides a wealth of potentially useful information. The data convincingly demonstrate in the HAP1 cell line that the MTF2-PRC2 complex sustains the effects of palbociclib (Figure 4), methylates H3K27 in CpG-rich promoters (Figure 5), and represses Cyclin D expression (Figure 6). These results could be of great interest to those studying cell-cycle control, resistance mechanisms to therapeutic cell-cycle inhibitors, and chromatin regulation and gene expression.

      There are several weaknesses that limit the overall quality and potential impact of the study. First, none of the results from the colchicine and camptothecin screens (Figures 1 and 2) are experimentally validated, which lessens the rigor of those data and conclusions. Second, all experiments validating and further exploring results from the palbociclib screen are restricted to the Hap1 cell line, so the reproducibility and generality of the results are not established. While it is reasonable to perform the initial screen to generate hypotheses in the Hap1 line, other cancer and non-transformed lines should be used to test further the validity of conclusions from data in Figures 4-6. Third, conclusions drawn from data in Figures 3D and 4D are not fully supported by the experimental design or results. Finally, there have been other similar chemogenetic screens performed with palbociclib, most notably the study described by Chaikovsky et al. (PMID: 33854239). Results here should be compared and contrasted to other similar studies.

      We thank the reviewer for their suggestions regarding our manuscript. While the genes recovered as mediating cellular responses to camptothecin and colchicine was never confirmed following our chemogenetic screens, we felt our primary findings were in the area of palbociclib resistance and decided focus our follow-up investigations on genes. We included the results camptothecin and colchicine chemogenetic screens as confirmation of the specificity of PRC2 mutation resulting in resistance to palbociclib (Fig. 4C) and for others in the community to use as a resource for future investigations. We have also clarified our results for Figure 3D and 4D in our revised manuscript, as well as included additional plots of these results (Fig. S1DS1F). And, with our resubmitted manuscript, we believe we have addressed their concern of the generality of our results by demonstrating our primary finding that MTF2 regulates D-type cyclins in additional cell lines other than HAP1. We feel these results indicate that while not “general”, there are additional cellular contexts that our main result holds true. In line with this, and to address how our chemogenetic screens fits into the landscape of previous studies, including Chaikosvsky et al., we have included the following lines to our discussion:  “Additionally, other chemogenetic screens utilizing palbociclib and have not identified that inactivation of PRC2 components as either enhancing or reducing palbociclib-induced proliferation defects, suggesting that PRC2 mutation is neutral in the cell lines studied. These observations not only underscore the context-dependent ramifications of mutation of these PRC2 complex members, but also may help inform the context in which CDK4/6 inhibitors are most efficacious.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "We found that only thirteen and twenty genes resulted in sensitivity or resistance, respectively, in every conditions tested and were deemed non-specific and excluded from any further analysis (see Table S2)." It's unclear to me why these genes were deemed 'nonspecific'. Are these genes functionally important for the general exclusion of xenobiotic molecules?

      By this, we simply meant that these effects were not specific to one condition. Such genes could affect drug half-life or a general stress response, but are less likely to have functions directly tied to the pathway targeted by a drug than are genes whose loss affects only one condition.  

      (2) "Given that increased CCND1 levels is sufficient to drive increased CDK4/6 kinase activity, upregulation of these D-type cyclins is likely to be a significant contributor to the palbociclib resistance in MTF2∆ cells." It's unclear to me what is the basis for this statement. This is only true if there is free CDK4/6. If CDK4/6 is already fully occupied by D-type cyclins, then increased CCND1 levels would not be expected to have an effect. 

      While we anticipated that increased levels of CCND1 would result in more CDK4/6-Dtype association, we now demonstrate in the new Figure S5F that there is more CCND1 in complex with CDK6 in both SUZ12∆ and MTF2∆ cell lines. Furthermore, we able to show in Figure S5G that overexpression of D-type cyclins results in resistant to palbociclib-induced proliferation defects in HAP1 cells.

      (3) The description of the results is very confusing in places, especially regarding "resistance" versus "sensitivity" genes. For example: "CCNE1, CDK6, CDK2, CCND2 and CCND1, all of which are integral to promoting the G1/S phase transition, ranked as the 2nd, 24th, 27th, 29th and 46th most important genes for palbociclib resistance, respectively (Figures 1F and 1G). CCND1 and CCND2 bind either CDK4 or CDK6, the molecular targets of palbociclib, whereas CDK2 and CCNE1 form a related CDK kinase that promotes the G1/S transition.

      Similarly, cells with sgRNAs targeting RB1, whose phosphorylation by CDK4/6 is a critical step in G1 progression, displayed substantial resistance to palbociclib." My reading of this paragraph suggests that disruption of the CDK6 locus is associated with palbociclib resistance - surely this is a typo and instead should have been sensitivity? Please explain.

      We thank the reviewer for pointing this out and have corrected this typo  

      (4) Sensitivity to palbociclib was enhanced in cells expressing sgRNAs targeting H4 acetylation, positive regulators of Pol II transcription, and regulators of the DNA Damage Response pathway (Figures 3A and 3B), although this sensitivity was much weaker than that seen with DNA damaging agents. This observation is consistent with long-term treatment with palbociclib inducing DNA damage, as has been suggested by a number of recent publications 65,66." This is also consistent with recent work on Cdk7 inhibitors (Wilson et al. Mol Cell 2023), as Cdk7 inhibition is expected to affect both CDK1/2/4/6 activities and Pol II transcription.

      We thank the reviewer for bringing this observation to our attention and we have added this citation to this passage in our manuscript.

      (5) Figure 3D - would it not make sense to plot the data such that palbo concentration is on the x-axis? It is also difficult to interpret since the data are normalized to starting "% proliferation" at the indicated palbo treatment, when it is likely that % proliferation changes significantly with palbo concentration. Indeed, this is the graphing format used for a later figure (Figure 4D). The data with rotenone suggests palbo antagonizes rotenone-mediated reduction in proliferation. But it's unclear to me whether the graph shows the converse - that rotenone treatment modulates palbo-induced cell cycle arrest.

      This reviewer is correct about the fact that increasing doses of palbociclib in the absence of oxidative phosphorylation do indeed have an effect on proliferation. However, it is helpful to normalize proliferation values to each initial dose of palbociclib and then compare this to the different oxidative phosphorylation inhibitors treatment combinations. To illustrate that the oxidative phosphorylation inhibitors do indeed antagonize palbociclib-induced proliferation defects, we have now included the data graphed as each oxidative phosphorylation inhibitor vs palbociclib as Supplemental Figures S1D-S1F.

      • The highest concentration of GSK126 tested (5µM) does not appear to confer resistance, but perhaps this is due to off-target effects or cytotoxicity?

      We agree with the reviewer that at the highest doses of dose of GSK126, low doses of palbociclib do not confer resistance to palbociclib. However, higher doses do appear to have this effect. We have included a statement in our results section to address this reviewer’s observations. 

      • Disruption of Emi1 leads to resistance (Figure 1F, FZR1), yet overexpression induces resistance (Mouery et al. bioRxiv 2023). Explain.

      We do not understand why EMI1 responds in this way, and therefore we cannot comment on this in the text. 

      Typos/stylistic comments:

      • Typo "However, the net result of these opposing effects on cell cycle progression, and the contribution of the individual subcomplexes to this regulation, rained unclear."

      We thank the reviewer for pointing this out, and we have corrected it.  

      • Use of the word "growth" - I think the authors should be more precise. Is "proliferation" meant here?

      We thank the reviewer for pointing this out, and we have corrected it.

      • n Figure 4G, two of the panels have 8.42%. Is this correct, or may it be a copy/paste error?

      This was an error, but is no longer relevant as we have reconducted and reanalyzed this experiment.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) Some of the conclusions should be confirmed in additional cell lines. I would suggest testing the resistance to Palbociclib in several additional cell lines, where MTF2 and JARID2 are deleted. If the conclusion can be generalized, one would expect that the differential role of MTF2 versus JARID2 can be confirmed in more cell lines.

      While the PRC2.1-dependent repression of D-type cyclins does not appear to be general, we have now demonstrated in Figures 5SE and 6F that there are multiple different cellular contexts in which our observations are consistent. Specifically, we demonstrate that GSK126 causes upregulation of CCND1 in both immortalized nontumor cells (COS7 cells) and in the breast cancer cell line MDA-MB-231. Moreover, in both cases we showed that this effect is PRC2.1-dependent, as shRNA knockdown of MTF2 increases expression of CCND1.

      (2) In addition, it may be attractive to make use of publicly available RNA-seq data of MTF2 and JARID2 knockout/down cells, to investigate the generality of the finding that PRC2.1 regulates CCND1 and CCND2.

      While it would be useful to address this issue, Figure S5E demonstrates that the repression of D-type cyclin expression by PRC2.1 is context dependent. Furthermore, prior to identifying the lines shown in Figure 6F and 5SE, we were not aware of which lines to focus our investigations on. However, we have now demonstrated a few cellular contexts in which either chemical inhibition of PRC2 or knockdown of MTF2 results in de-repression of CCND1 expression.

      (3) At a bare minimum the authors should strongly discuss the limitations of the study, and tone down the conclusions.

      We would agree with this based upon the data in the original submitted manuscript, however, now that we have shown that this effect is more general, this is less critical. That said, we do not see this effect in all cell lines, and we have made this apparent in the final version of the manuscript.

      Minor point

      (1) In my view, Figures 1-3 should be shortened to the most essential points, and some data/figures should be moved to the supplementary figures. Especially the STING genenetwork graphs are in my view not particularly meaningful.

      While we understand the opinion of this reviewer, we feel that these data will be of significant interest to some readers.  

      (2) Figure 6E and 6F/G appear to be largely redundant. This can perhaps be made more concise.

      This has been addressed in the new version of Figure 6

      (3) Figure 5D should be enlarged. 

      We thank the reviewer for this suggestion and have enlarged the image.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript could be edited to improve clarity. In several places, the scientific logic motivating an experiment is confusing, and there are several hypotheses and conclusions that seem opposite from what the data are suggesting. Some aspects of the figures were also unclear. Specific examples include the following:

      (1) Last sentence of abstract : "Our results demonstrate a role for PRC2.1, but not PRC2.2, in promoting G1 progression." Data show that knockout of PRC2.1 components promotes G1 progression through upregulation of CycD, so the conclusion here is the opposite.

      We thank the reviewer for catching this error. We have now changed this to “in antagonizing G1 progression”.

      (2) In the second paragraph of the results, CCNE1, CDK2, etc are described as scoring high for palbociclib resistance, but those genes scored as sensitizing. Also, in that paragraph, it is described that a drug is sensitizing cells to loss of a gene, which seems like incorrect logic. It should be clarified that knock-out of a gene either sensitizes or desensitizes cells to the drug.

      We thank the reviewer for catching this error. We have now corrected it.  

      (3) In the motivation for the experiment in Figure 3D, it is written: "we asked whether chemical inhibition of oxidative phosphorylation could rescue sensitivity to palbociclib". Considering that knock-out of genes that mediate oxidative phosphorylation confer resistance to palbociclib, it is confusing why it was expected that chemical inhibitors would restore sensitivity.

      We are sorry if the original wording was confusing. We have now changed this to “combined inhibition of oxidative phosphorylation and CDK4/6 activity mutually rescue the proliferation defect imposed by agents targeting the other process”.  

      (4) If the intention of Figure 3D is to test the hypothesis that chemical inhibition of oxidative phosphorylation modulates sensitivity to palbociclib, the clarity of Figure 3D would be improved if data were shown such that palbociclib concentration is on the x-axis and the different curves are different drug concentrations.

      It appears that there is some mutual suppression, which inhibition of each process rescues cells partly from inhibition of the other. In fact, with these drugs the stronger of the two is seen as the rescue of mitochondrial poisons by palbociclib. We have now discussed this in the text.  

      (5) The authors should check the units on the x-axis in Figure 4D, should they be log[uM Palbo] or log [nM Palbo]?

      We thank the reviewer for catching this error. We have now corrected it

      (6) It should be clarified which data are summarized in the graph to the right in Figure 4G, are these experiments with palbociclib?

      This is currently included in the figure legends.

      (7) The text suggests that the control CCNE1 knockout is shown in Figure 4E, but those data are missing.

      This has been corrected in Figure 4E.

      Several conclusions are not well supported by the data and should be revised or more data and analysis should be added.

      (1) The titular conclusion that the "PRC2.1 Subcomplex Opposes G1 Progression through Regulation of CCND1 and CCND2" has only been demonstrated in the context of a Cdk4/6 inhibitor in HAP1 cells. There is little evidence supporting this claim that is broadly applicable. For example, data in Figure 4G show small and not demonstrable significant differences in G1 and S phase populations in the mock experiments. Also, experiments in other cells are needed to support the rigor and generality of the conclusion.

      Our chemogenetic screen and competitive proliferation assay data in Figure 4A, 4C and 4E support the conclusion that PRC2.1 and PRC2.2 play opposing roles in G1 progression. Furthermore, we have repeated the initial BrdU incorporation experiments shown in Figure 4G and have been able to demonstrate that JARID2∆ cells do indeed display a significant decrease of cells entering into S-phase when treated with palbociclib. Most importantly, in the Figures 6D and 6E we show additional cell lines where this is the case.  Therefore, we feel that this title is valid in the current version of the manuscript, where we have shown it to be the case in multiple tumor-derived human cell lines as well as immortalized non-human primate cells.  

      (2) It is unclear how the data in Figure 3D support the conclusion that the administered inhibitors of oxidative phosphorylation influence response to palbociclib.

      As noted in the response to point 4, we have now discussed this mutual rescue more thoroughly in the text.  

      (3) In Figure 4D, the IC50 values should be calculated and statistical significance based on biological replicates should be determined. Also, the conclusion that "increasing doses of GSK126 withstood palbociclib-induced growth suppression" is overstated, as ultimately all drug conditions succumb to palbocilib suppression of proliferation, although there may be differences in sensitivity.

      We have now  included a statical analysis of each data point in Figure 4D.  

      Editorial comments:

      (1) The title does not seem to optimally capture the content of the paper. Please consider changing it, e.g. focusing on palbociclib resistance. 

      While we used this particular drug to make the original observation, we feel it is more general to discuss the underlying biology (cyclin gene control) than the pharmacological methodology. Moreover, we have now extended our findings about the regulation of D-type cyclins by PRC2.1 to several cell lines, derived from both cancers and primary cells, re-enforcing the fact that this effect is observed more broadly.   

      (2) Please indicate the biological system (haploid human HAP1 cells) in either title or abstract.

      The abstract now indicates that we have observed this in CML, breast cancer and immortalized primary cells.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the relationship between low estrogen levels, postmenopausal hypertension, and the potential role of the molecule L-AABA as a biomarker for hypertension. By employing metabolomic analysis and various statistical methods, the study seeks to understand how estrogen deficiency affects blood pressure and identify key metabolites involved in this process, with a particular focus on L-AABA.

      Strengths:

      The study addresses a relevant and understudied area: the role of estrogen and metabolites in postmenopausal hypertension. It presents a novel hypothesis that L-AABA may serve as a protective factor against hypertension, which could have significant clinical implications if proven.

      We appreciate the acknowledgment of our study’s focus on an important and understudied area. Our hypothesis regarding L-AABA’s role as a possible protective factor against hypertension indeed holds promise for advancing clinical implications.

      Weaknesses:

      The evidence linking L-AABA to hypertension is largely correlative, lacking experimental validation or mechanistic proof. Key limitations, such as the inadequacy of the ovariectomy model in replicating human menopause, are acknowledged but not addressed with alternative approaches. In summary, while the study offers an intriguing hypothesis, its conclusions are premature and require further experimental validation and human data to substantiate the claims.

      We recognize the limitations regarding the correlative nature of our findings and the inadequacy of the OVX model in replicating human menopause. Future research will prioritize experimental validation and incorporate human studies to solidify our conclusions.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Dr. Yao Li et al. documented the metabolomic profile of the aorta from OVX rats and that from OVX plus E2. These conditions mimic post-menopause hypertension and hormonal replacement therapy.

      Strengths:

      The authors state that this is probably the first study to examine the metabolic changes in the aorta of post-menopause hypertension.

      As pointed out by the reviewer, our study may be the first to investigate changes in aortic metabolism in postmenopausal hypertension. As an exploratory study, our goal is to depict the overall characteristics and explore possible research directions.

      Weaknesses:

      There are several weaknesses, and a few of them are quite serious.

      (1) The aorta is not a resistant artery and has little to do with hypertension. The authors should have used resistant arteries for this study. The expression of several adrenergic receptors and cholinergic receptors in the aorta and resistant arteries are different. It is unknown whether the aorta metabolomic profile has any relevance to BP and whether they are similar to that of the resistant arteries. I understand the logistics issue of obtaining enough tissues from resistant arteries. At least, once some leads are discovered in the aorta, the authors should validate it in resistant arteries. This should be feasible.

      We acknowledge the limitation of using the aorta and will aim to include studies on resistant arteries to validate our metabolomic findings.

      (2) The aorta and all the arteries have three layers. It is critically important to know whether the metabolic changes occur in the intima or in the media, while the adventitia probably has little to do with vasoconstriction and hypertension. If the authors want to use the aorta to conduct the preliminary study, they should completely remove the adventitia and then use samples with and without their endothelium stripped and then assess their metabolomic profiles. After the leads are obtained from this preliminary profiling, they should be validated in endothelium and smooth muscles of the resistant artery. The current experiments are not appropriately designed.

      Future studies will involve detailed profiling of specific arterial layers, focusing on the intima and media to enhance the relevance of our findings related to hypertension.

      (3) The tail-cuff BP measurement is a technique of the last century. The current gold standard of BP measurement is by telemetry. The tail-cuff method is particularly problematic in this study because the 1-2 h restraining of the rats for more than 10 times BP measurement will cause significant stress in the animal, and their stress hormone secretion might cause biased metabolomic profiles in the OVX versus shames operated mice. The problem can be totally avoided by using telemetry.

      We appreciate the suggestion and will consider telemetry for more accurate blood pressure measurements in future experiments to minimize stress-related bias.

      (4) Although the L-AABA showed a high p-value (10^-4) of a decrease in the OVX rats, the fold change is small (2-3 folds). Such a small change should be validated using a different method to be convincing.

      We plan to employ additional methods to validate the observed changes in L-AABA levels in the following research, ensuring robustness of our findings.

      (5) The authors claim (or hypothesize) that the reduced AABA level in OVX can cause vascular remodeling. This can be easily validated by the histology of the OVX-resistant artery, and they should do that during the revision. The authors should also examine the M1 macrophage function from the OVX mice to validate their claimed link of AABA to M1.

      We intend to conduct histological analyses and examine M1 macrophage function in OVX-resistant arteries to validate our hypothesis in the following research.

      (6) As mentioned above, the authors need to pinpoint the changes of AABA to target cells, i.e., endothelial cells, SMC, or M1, and then use in vitro or in vivo cell biology approaches to assess whether these cells in the OVX rat indeed have an abnormality in function and, indeed, such functional changes are responsible for the BP phenotype.

      Addressing these points, we aim to pinpoint specific cell types affected by AABA variations and conduct in vitro and in vivo studies to examine their physiological impacts in the following research.

      (7) The results of the current study can be condensed into 1 or 2 figures that can serve as a base or a starting point for a deeper scientific study.

      Thank you for your suggestion. As a omics research, our research approach may differ from traditional mechanism studies.

      Summary

      The experimental design of this manuscript is inappropriate, and the methods are not up to the current standards. The whole study is descriptive and rudimentary. It lacks validation and mechanism. The data from this manuscript might be of some value and can serve as the first step for more investigation of the mechanism of post-menopause hypertension.

      Reviewer #3 (Public review):

      Summary:

      The decrease in estrogen levels is strongly associated with postmenopausal hypertension. Dr. Yao Li and colleagues aimed to investigate the metabolomic mechanisms of underlying postmenopausal hypertension using OVX and OVX+E2 rat models. They successfully established a correlation between reduced estrogen levels and the development of hypertension in rats. They identified L-alpha-aminobutyric acid (AABA) as a potential marker for postmenopausal hypertension. The research explored the metabolic alterations in aortic tissues and proposed several potential mechanisms contributing to postmenopausal hypertension.

      Strengths:

      The group performed a comprehensive enrichment analysis and various statistical analyses of the metabolomics data.

      As summarized by the reviewer, our current study conducted a comprehensive analysis of metabolomics data. It is also a reliable foundation for further mechanism research.

      Weaknesses:

      (1) The manuscript is descriptive in nature, although they mentioned their primary objective is to explore the potential mechanisms linking low estrogen levels with postmenopausal hypertension. No mechanism insights have been interrogated in this study, which has been mentioned by the authors in the discussion. The connection between E2, AABA, and macrophage needs to be validated in endothelial cells, vascular smooth muscle cells, and other aortic tissue cells. Without such verification, the manuscript predominantly raises hypotheses only based on metabolomic data.

      We have proposed research hypotheses based on detailed omics data. Further research on the mechanisms involving endothelial and vascular smooth muscle cells to validate the pathway connections between E2, AABA, and macrophages is undoubtedly the future direction of this study.

      (2) The serum contains three forms of estrogen: Estradiol, Estrone, and Estriol. The authors used the Rat E2 ELISA kit. Ideally, all three forms of estrogen should be measured.

      Future assays will aim to measure Estradiol, Estrone, and Estriol to capture a more comprehensive picture of estrogen’s role in postmenopausal hypertension.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

      We thank the reviewer for the positive comment. We agree that investigating factors that could compensate for the fitness attenuation caused by DedA mutation would enhance our understanding of the role of DedA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The use of the term "N. gonorrhoeae wildtype" should not be used. It is uninformative, as the species contains a large amount of diversity. Instead, please name the strain. From Figure 1, it looks like the authors used MS11. Since MS11 is a longstanding lab strain and likely does not reflect circulating N. gonorrhoeae, and since H041 is no longer in circulation, the authors should ideally test the compound against more representative strains of N. gonorrhoeae. This includes panels of isolates available through the CDC, for example (https://www.cdc.gov/drugresistance/resistance-bank/index.html). I encourage the authors to include FC428 or another recently identified isolate with the penA 60 allele to demonstrate oxydifficidin's activity against contemporary concerning isolates/lineages.

      (1) “N. gonorrhoeae MS11” is now used instead of “N. gonorrhoeae WT” in this manuscript.

      (2) In our revised manuscript, we have added MIC data for recently identified Neisseria gonorrhoeae isolates AR#1280 and AR#1281 which contain the penA 60 allele (Table 1). The data shows oxydifficidin maintains its potent activity against these multidrug-resistant strains. We also added a description of this data to the results section as shown below.

      Original text: “Oxydifficidin was more potent against N. gonorrhoeae MS11 than almost all other antibiotics we tested. In fact, it was only slightly less active than the highly optimized third-generation cephalosporin, ceftazidime.([18]) However, unlike third-generation cephalosporins, oxydifficidin retained activity against the multidrug resistant H041 clinical isolate (Table 1).([4]) H041 is resistant to the “standard of care” cephalosporin ceftriaxone (2 µg/mL) as well as a number of other antibiotics that are normally active against N. gonorrhoeae (penicillin G, 4 µg/mL; cefixime, 8 µg/mL; levofloxacin, 32 µg/mL).”

      Changed to: “Oxydifficidin was more potent against N. gonorrhoeae MS11 than most other antibiotics we tested. Notably, unlike clinically used antibiotics such as ceftriaxone, azithromycin, and ciprofloxacin, oxydifficidin retained activity against all multidrug-resistant clinical isolates we examined (Table 1).” (Line 77-79)

      (2) Does oxydifficidin have activity against N. meningitidis? It is the species most closely related to N. gonorrhoeae and the other pathogenic Neisseria.

      Oxydifficidin has potent activity against N. meningitidis ATCC 13090. In our revised manuscript, we have included its MIC data in Figure 1c.

      (3) Given claims that oxydifficidin activity in N. gonorrhoeae as compared to other Neisseria reflects N. gonorrhoeae's dedA and sensitive rplL, it would be good to assess the allelic diversity of these genes in N. gonorrhoeae. There are over 20,000 genomes from clinical isolates of N. gonorrhoeae in databases. It should be straightforward to check whether dedA and rplL allelic variants already exist in the population. Should variants be observed, oxydifficidin should be tested against the associated strains of N. gonorrhoeae.

      Response: We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      New text: “A survey of 220 N. gonorrhoeae strains with high-quality assemblies in NCBI found no mutations in the DedA protein.” (Line 104-105)

      “These two mutations were not found in the survey of the same collection of N. gonorrhoeae strains used to look for DedA mutations.” (Line 143-144)

      (4) Clinically relevant antibiotics for N. gonorrhoeae are penicillin, tetracycline, spectinomycin, gentamicin, ciprofloxacin, azithromycin, ceftriaxone; moreover, zoliflodacin and gepotidacin have reportedly successfully completed phase 3 trials. The authors should redo their MIC testing with these antibiotics (e.g., for Figures 1 and 2 and Tables 1 and 2), both because this will enable direct comparison with the many clinical isolates that have undergone testing and because these are the drugs most pertinent to clinical practice. Ampicillin, ceftazidime, chloramphenicol, bacitracin, and daptomycin are not relevant. Could the authors explain why they tested vancomycin, polymyxin B, irgasan, melittin, avilamycin, and thiostrepton?

      Our use of antibiotics with diverse modes of action (e.g. vancomycin, polymyxin B, irgasan, melittin, avilamycin, and thiostrepton) in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      To address the reviewer’s concern, in our revised manuscript, we have added MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone) to Table 1.

      (5) Please describe the characteristics of the transposon library (finding four transposons in a single strain does seem unexpected, given how most transposon libraries aim for one transposon insertion per strain).

      We understand that one transposon insertion per strain is ideal for transposon libraries. This Bacillus strain proved to be recalcitrant to genetic manipulation. In the rare cases where we obtained resistance colonies upon electroporation with the transposon, all colonies contained multiple (≥ 4) transposon insertions. This made it impractical to build a library with one transposon insertion per library member.

      We assumed that the anti-N. gonorrhoeae activity most likely originated from a natural product BGC, which typically range from 10-100 kb in size.

      Based on the average of 50 kb per BGC, ~80 transposon insertions would be required to fully search the 4.2 Mb genome of Bacillus amyloliquefaciens BK for a BGC. At 4 mutations per transformant, 1x coverage of the genome would require only 20 library members.

      After extensive electroporation of transposon into Bacillus amyloliquefaciens BK, we were able to obtain a library of 50 members, including one mutant (Tn5-3) that lacked anti-N. gonorrhoeae activity.

      New text added to the methods section:

      “A library containing 50 transposon mutants was obtained. In the mutants examined, each strain contained ≥4 transposon insertions” (Line 337-339)

      (6) Please describe in the methods how you sequenced and annotated the genome of Bacillus amyloliquefaciens BK.

      The sequencing method is now described in “Genomic Sequencing and annotation of Bacillus amyloliquefaciens” section. The genome of Bacillus amyloliquefaciens BK was not fully annotated. Mutations were identified as described in the updated methods section below.

      New text:

      “Genomic Sequencing and annotation of Bacillus amyloliquefaciens

      Genomic DNA from Bacillus amyloliquefaciens BK WT and transposon mutant Tn5-3 was isolated using PureLink Microbiome DNA purification kit (Invitrogen) according to the manufacturer’s instructions.

      The Bacillus amyloliquefaciens BK WT genome was assembled by mapping its sequencing data onto the annotated genome of Bacillus amyloliquefaciens FZB42 using Geneious Prime. Differences in the mutant strain Tn5-3 were identified by mapping its sequencing data onto the assembled Bacillus amyloliquefaciens BK WT genome. The mutated genes were then annotated using NCBI BLAST. The oxydifficidin BGC was annotated using the antiSMASH online server.” (Line 253-260)

      (7) Please describe in the methods how you screened the library for strains that lacked anti-gonococcal activity.

      The method is added to our revised manuscript as section “Screening of Bacillus Strains Lacking Anti-N. gonorrhoeae Activity”.

      New text:

      “Screening of Bacillus Strains Lacking Anti-N. gonorrhoeae Activity

      The transposon mutants of Bacillus amyloliquefaciens BK were grown overnight in LB medium at 30 °C. Each overnight culture was then diluted 1:5000, and 1 μl of the diluted culture was spotted onto a GCB agar plate swabbed with N. gonorrhoeae cells. The plate was then incubated overnight at 37 °C with 5% CO2. The mutant strain (Tn5-3) lacking anti-N. gonorrhoeae activity was identified due to its failure to produce a zone of growth inhibition in the resulting N. gonorrhoeae lawn.” (Line 341-346)

      (8) Was only one strain found that was a 'non-producer' of anti-N. gonorrhoeae activity? Line 68 suggests that this was only one of multiple non-producers. Is that correct? If so, did you work up the others, and did they also have disruptions in the same biosynthetic gene cluster?

      Only one strain was identified as a “non-producer” of anti-N. gonorrhoeae activity. We have modified the text to clarify this point.

      Original text: “The sequencing of one non-producer strain revealed that it surprisingly contained four transposon insertions and one frame shift mutation.”

      Changed to: “The sequencing of the non-producer strain revealed that it surprisingly contained four transposon insertions and one frame shift mutation.” (Line 53-54 )

      (9) All sequences (including Bacillus amyloliquefaciens BK) must be deposited in a public database (e.g., NCBI) and the accession numbers reported in the manuscript.

      Genomic sequence data of Bacillus amyloliquefaciens BK has been deposited in GenBank, and its accession number (GCA_019093835.1) now appears in figure legend of Figure S1a.

      Figure S1a legend:

      “Genome-based phylogenetic tree containing Bacillus amyloliquefaciens BK and closely related Bacillus spp. The tree was built by Genome Clustering of MicroScope using neighbor-joining method. The NCBI accession numbers of Bacillus strains used in the tree are GCA_000196735.1, GCA_000204275.1, GCA_000015785.2, GCA_019093835.1, GCA_000009045.1, GCA_000011645.1, GCA_000172815.1, GCA_000008005.1, and GCA_000007845.1 (from top to bottom).”

      Minor

      (10) Statements in the article would benefit from fact-checking. For example:

      - gonorrhea is not the second most prevalent sexually transmitted infection worldwide; it is the second most reported bacterial sexually transmitted infection.

      - Treatment is ceftriaxone 500mg IM x1 in the US, but 1g IM x1 in the UK and Europe. The UK guidelines also permit ciprofloxacin, should sequencing indicate gyrA 91S. I suggest reviewing / specifying which treatment guidelines you're referring to.

      We appreciate the reviewer’s corrections. The word “prevalent” is now changed to “reported”.

      Original text: “Gonorrhea, which is caused by Neisseria gonorrhoeae, is the second most prevalent sexually transmitted infection worldwide.”

      Changed to: “Gonorrhea, which is caused by Neisseria gonorrhoeae, is the second most reported sexually transmitted infection worldwide.” (Line 2-3)

      Original text: “Gonorrhea is the second most prevalent sexually transmitted infection worldwide, its causative agent is the bacterium Neisseria gonorrhoeae.”

      Changed to: “Gonorrhea is the second most reported sexually transmitted infection worldwide, its causative agent is the bacterium Neisseria gonorrhoeae.” (Line 18-19)

      “In the USA” is now added to the sentence stating gonorrhea treatment.

      Original text: “The high dose (500 mg) of the cephalosporin ceftriaxone is currently the only recommended therapy for treating gonorrhea infections.”

      Changed to: “The high dose (500 mg) of the cephalosporin ceftriaxone is currently the only recommended therapy for treating gonorrhea infections in the USA.” (Line 20-22)

      (11) Please make sure all results are in the results section. The report of cell morphology, for example, should be in the results, not the discussion.

      In our revised manuscript, we have included the cell morphology data in the results section with the text changes below.

      Original text: “Interestingly, not only was dedA deficient N. gonorrhoeae less susceptible to oxydifficidin, oxydifficidin also kills this mutant more slowly (Figure 2b) than WT N. gonorrhoeae MS11.”

      Changed to: “Interestingly, not only was dedA deficient N. gonorrhoeae less susceptible to oxydifficidin, oxydifficidin also kills this mutant more slowly (Figure 2b) than WT N. gonorrhoeae MS11. The dedA deletion mutant also showed an altered cell morphology with reduced membrane integrity and lower formation of micro-colonies (Figure S4). (Line 100-104)

      Original text: “The dedA deletion mutant also showed an altered cell morphology with reduced membrane integrity and lower formation of micro-colonies (Figure S4), indicating that it should show reduced pathogenesis and fitness, and, as a result, not accumulate in a clinical setting, which adds to the therapeutic appeal of oxydifficidin.”

      Changed to: “The dedA deletion mutant exhibited altered cell morphology, characterized by diminished membrane integrity and reduced micro-colony formation, indicating that it should show reduced pathogenesis and fitness, and, as a result, not accumulate in a clinical setting, which adds to the therapeutic appeal of oxydifficidin” (Line 206-210)

      (12) Tables 1 and 2 should be combined and should address the most relevant antibiotics

      The MIC data of additional relevant antibiotics are now included in Table 1. However, we still believe that keeping Tables 1 and 2 separate enhances the clarity of the manuscript. Table 2 specifically focuses on diverse ribosomal targeting antibiotics, which highlights the unique binding site of oxydifficidin.

      (13) Supplemental Figure 1a. The tree could be better resolved, and there are four entries with the identical listing of "Bacillus amyloliquefaciens subsp. plantarum" on different branches. In the methods or the legend, please indicate the accession numbers for these genomes. Also please specify how this tree was made-is it a maximum likelihood tree? Something else?

      The tree is now better resolved and includes new entries. The requested information regarding accession numbers and tree construction method has been included in the figure legend.

      New supplemental Figure 1a legend:

      “a. Genome-based phylogenetic tree containing Bacillus amyloliquefaciens BK and closely related Bacillus spp. The tree was built by Genome Clustering of MicroScope using neighbor-joining method. The NCBI accession numbers of Bacillus strains used in the tree are GCA_000196735.1, GCA_000204275.1, GCA_000015785.2, GCA_019093835.1, GCA_000009045.1, GCA_000011645.1, GCA_000172815.1, GCA_000008005.1, and GCA_000007845.1 (from top to bottom).”

      Reviewer #2 (Recommendations For The Authors):

      The conclusions drawn in the manuscript are well-supported by the experimental data presented.

      I have the below minor comments:

      (1) "serendipitously identified" - I feel this wording should be avoided throughout the manuscript. The point of a research paper is to communicate methodology and experimental detail, and this language portrays the opposite.

      While we agree that methodology and experimental procedures are paramount in scientific reporting, we believe it is equally important to convey, particularly to younger generations, that a part of the scientific process is often unplanned and can benefit from chance observations. Therefore, we would like to keep this wording.

      (2) The introduction should include the biological roles/function of DedA proteins in bacteria.

      DedA proteins perform a wide array of biological roles and functions in bacteria. In the results section (Line 107-116), we have described the most well-established of these functions, particularly the flippase activity, which appears to be directly related to oxydifficidin sensitivity. We believe that introducing this information in the results section enhances the manuscript’s clarity and flow.

      (3) "When we screened this contaminant for antibacterial activity against lawns of other Gram-negative bacteria it did not produce a zone of growth of inhibition against any of the bacteria we tested (e.g., Escherichia coli, Vibrio cholerae, Caulobacter crescentus)." Can these data Figures be included in the Supplements?

      This result was recorded in the lead author’s notebook, but no image was saved.

      (4) Line 52: Was any base analyses performed on the Tn-mutants i.e., how many insertion-sites? Depth of mutants? Was a library constructed in this study or previously? Why were only BGC assessed?

      Please see our response to Reviewer #1’s comment (5). We focused on BGCs because we believed the anti-N. gonorrhoeae activity most likely resulted from a molecule encoded by a natural product BGC.

      (5) Line 98: Do the other 2 predicted DedA-like proteins also have a role in uptake of oxydifficidin? Is there some redundancy in uptake?

      We generated knockout mutants for two other predicted DedA-like proteins in N. gonorrhoeae MS11, and the MIC of oxydifficidin for these mutants remained the same as for the N. gonorrhoeae MS11 wild type strain. Therefore, we believe that the DedA protein discussed in this manuscript is the primary transporter of oxydifficidin. However, we cannot completely rule out the possibility of redundancy in oxydifficidin uptake by other DedA-like proteins.

      New text: “We also generated deletion mutants for two other predicted dedA-like genes, and the MIC of oxydifficidin for these mutants remained the same as for the N. gonorrhoeae MS11 wild type strain.” (Line 98-100)

      Reviewer #3 (Recommendations For The Authors):

      This is a well presented manuscript and I could not immediately see any issues with it.

      We appreciate the reviewer’s positive feedback.

    1. Author response:

      We are submitting a revised manuscript with major additions that address the main concerns in the initial reviews. At the highest level, this revision provides i) orthogonal biochemical measurements that yield concrete evidence of lysosomal protein aggregates, and ii) a plausible mechanism linking lysosomal lipid handling and protein aggregation through disruption of ESCRT function. We believe these additions significantly improve the completeness of this study and the conclusions that can be drawn from the data.

      Below are more specific highlights on the addition in this revision:

      -       We included orthogonal techniques (thioflavin-T staining and Lyso-IP followed by differential extraction) and confirmed the accumulation of RIPA-insoluble protein aggregates at the lysosomes in cells under lipid perturbation (Figure 3).

      -       We performed TMT-Proteomics and identified accumulation of insoluble ESCRT components at the lysosomes under lipid perturbation (Figure 4). Two new authors involved in this effort are added onto the manuscript.

      -       The ESCRT result prompted us to revisit lysosomal membrane integrity. With improved imaging conditions and analysis we were able to see increased membrane permeabilization under lipid perturbation. VPS4A overexpression partially rescued this phenotype, suggesting that lipid accumulation impairs ESCRT disassembly (Figure 5).

      -       Together, the results suggest that lipid perturbation impairs ESCRT function, compromising both lysosomal membrane repair and microautophagy, resulting in the accumulation of endogenous protein aggregates at the lysosomes (Graphical Abstract).

      Reviewer #1 (Recommendations For The Authors):

      (1) Perhaps the most prominent limitation of this work is the unilateral focus on native cells (i.e. cells under no endogenous or exogenous stress) as the model for protein aggregate formation. Furthermore, although the ProteoStat stain has been utilized by many investigators before, the sole reliance on this stain as the read-out for their assays is concerning. To compound the concern, the ProteoStat-positive puncta co-localize with lysosmal markers which was surprising even to the authors. All in all, it behooves the authors to test proteostasis in multiple parallel ways to actually define what they are studying. How is it possible that protein aggregates under native conditions are only co-localized with lysosomes? Are we really studying protein aggregates which should predominantly be cytoplasmic insoluble aggregates?

      (a) They need to get away from a simple stain like ProteoStat and conduct co-stainings with other markers such as poly-ubiquitin antibodies and other chaperones to define what and where else exactly are these aggregates.

      Co-staining with poly-ubiquitin was included in the original manuscript. We added orthogonal staining with another widely used amyloid dye, Thioflavin-T, and provided fine-grained quantification of lysosomal vs cytosolic localization of various signals (Figures S4A-C & 3A-B).

      (b) They need to do Immunoblots with and without triton insolubility to see if these aggregates are insoluble as most would predict. They can do lysosomal isolation vs cytoplasmic to see if the insoluble aggregates are really lysosomal.

      We performed Lyso-IP followed by differential detergent extraction to confirm the accumulation of insoluble proteins at the lysosomes (Figure 3C). Proteomic analysis identified some of these insoluble proteins as ESCRT subunits (Figure 4).

      (c) They should compare aggregate formation in the native state versus cells with lysosomal inhibition via Bafilomycin or chloroquine versus cells with proteosomal inhibition. The lysosomal inhibition experiments are particularly informative given the lysosomal relevance they have uncovered.

      We included other small molecule inhibitors and at different time points to compare the effect of different modes of proteostasis challenge (Figure S4A-D). Together with the ESCRT finding, our results suggest the role of microautophagy in our system, and provide a model of how ProteoStat- and/or ubiquitin- positive substrates become partitioned between the cytoplasm and lysosomes under different perturbations.

      (d) Many protein aggregates which are too bulky for proteosome degradation will traditionally be dealt with by aggrephagy. Why is this not observed?

      Knockdown of core macroautophagy components did not impact Proteostat intensity in our CRISPRi screen, suggesting that basal macroautophagy plays a negligible role in clearing endogenous amyloid-like structures in our experimental system. We provide an alternative model that these aggregates instead arrive at the lysosomes via microautophagy.

      (2) After addressing #1, they can validate if the genes they identified by CRISPR screens are also important in modulation of protein aggregate burden in other systems. For example, if they inhibit lysosomes by Bafilo or Chloroquine to obtain protein aggregates and then Knockdown the identified genes in the CRISPR screens, will they get the same results?

      We addressed the effect of different modes of proteostasis challenge as recommended above. Deacidifying the lysosomes alone causes intense protein aggregation (Figure S4A-D) and eventually cell death, and was thus not combined with other perturbations.

      (3) They identify lysosomal lipid metabolism genes/pathways as the culprit for inducing proteostasis. In particular sphingolipid and cholesteryl ester species appear to be operational here. However, there are no specific lipids species or specific lipid metabolism gene that is causative. Rather, you have to knockdown entire processes to have an effect. This suggests that the focus on lysosome health (i.e. permeability, proteolysis, etc) is rudimentary. When you have to knockdown entire classes of lipids, this would indicate more broad effects on cellular lipids (including membrane lipids beyond the lysosome) and related cellular health?

      We included data on the effect of knocking down MYLIP, PSAP, and as a comparison PSMD2 on the growth rate of K562 cells (Figure S5A). MYLIP and PSAP KDs, which cause predominantly an accumulation of lipids, do not impede cell growth. Increasing lipid uptake by MYLIP KD increases cell proliferation under our culture conditions, suggesting a general negative impact on cell health was not required for the association between lipid levels and protein aggregates.

      (a) They conduct a superficial methyl-beta-cyclodextrin experiment with equivocal results. The use of MBCD for different time-courses to deplete various membrane cholesterol pools including the plasma membrane pool is important to ascertain what aspect of the cellular cholesterol is affecting proteostasis. MBCD +/- cholesterol reintroduction time-courses for rescue will also be key to determine the culprit cellular cholesterol pool.

      The MBCD / Filipin experiment helped us determine that ProteoStat doesn’t directly stain cholesterol, nor any major plasma membrane components. Free cholesterol was implicated in neither the screen nor the lipidomics and was not the subject of targeted experiments.

      (b) The same concept can be applied to sphingolipids. There are sphingolipids in abundance in multiple membrane compartments. Which ones are causal here? More nuanced evaluation of this with sphingolipid staining/tracking can be conducted.

      We attempted experiments where sphingolipids were added back to cells grown in FBS-depleted media. Nevertheless, we were not able to consistently deliver these lipid species and doing so while ensuring the correct subcellular localization at physiologically relevant level would require substantial methods development.

      (c) As part of this, are lipid rafts and/or caveolae being affected by the perturbations in cholesterol and sphingolipids? Lipid rafts are highly enriched in these 2 lipids which could link to their preteostasis observation.

      Indeed, ceramides released from SM hydrolysis are proposed to self-assembled into microdomains with negative curvature that can promote the formation of intralumenal vesicles (Alonso and Goni, 2018; Niekamp et al 2022). We propose that SM accumulation may hinder this process by counteracting the negative membrane curvature and impede microautophagy.

      (d) How about ER membrane lipids? The UPR and subsequent effects on proteostasis are intricately involved with ER lipid bilayer composition.

      We did not perform lipidomics on ER membranes in this study, though we note that at steady state, sphingolipids and cholesterol esters are not expected to be enriched at the ER (Ikonen and Zhou, 2021). We checked whether lipid-related genetic perturbations induced the UPR in published perturb-seq data in K562 cells. Neither MYLIP nor PSAP knockdown induced a UPR.

      In conclusion, the manuscript is interesting but the excitement over a link between lysosome-related lipid metabolism and proteostasis needs to be tamped until a more robust experimental approach is employed to generate supportive and corroborating results.

      Reviewer #2 (Recommendations For The Authors):

      - The paper has a number of grammatically awkward sentences. Editing these would enhance clarity.

      - It is important to show the co-localization of aggregates with the lysosome. This is shown in supplements but should be in a main figure. Here the authors cite previous work indicating that ProteoStat puncta co-localize with ubiquitinated proteins and state that they do not see this, then essentially just move on. Is there an explanation for this discrepancy and can it be resolved? What do they think is really going on? What happens to levels of ubiquitinated proteins when lipid metabolism is perturbed as in these experiments?

      We have included the lipid-induced lysosomal protein aggregation data in the main text (Figure 3A-B), and provided fine-grained quantification of the cytosolic-vs-lysosomal ProteoStat / Ub / ThT signals under different aggregate-inducing conditions (Figure S4A-D). We discuss these results in the main text and propose a model involving ESCRT-mediated microautophagy in the main text. This is supported further by the LysoIP-proteomics and LMP analysis.

      - Please add an indicator of amino acid numbers to Fig. 3C.

      These annotations are now included (now Figure S3C).

      - The legend for 3D is mislabelled.

      We have corrected the legend (now Figure S3D).

      Reviewer #3 (Recommendations For The Authors):

      Protein homeostasis and lipid homeostasis are both are important for maintaining cellular functions. However, the crosstalk remains largely unknown. The manuscript entitled as "Impairment of lipid homoeostasis causes accumulation of protein aggregates in the lysosome" deals with this interesting topic. An important link between lysosomal protein aggregation and sphingolipids/cholesterol esters metabolism were discovered. The topic belonging to the Cell Biology domain also falls into the aims and scope of eLife. Here are the revisions I recommend:

      (1) From lipidomics analysis, a remarkable correlation between levels of sphingomyelin and cholesterol ester and ProteoStat staining was found. Could the authors explain how sphingomyelin and cholesterol ester are quantified? The two lipids are not included as internal standards from the lipidomics experiment.

      Sphingomyelin and cholesterol ester internal standards are included in the Avanti 330707 SPLASH® LIPIDOMIX® Mass Spec Standard, which was supplied at 3% v/v to the MeOH/H2O cell lysis buffer. We have amended the Methods section to clarify this.

      (2) Could the authors perhaps delete Figure 1B and show it on Figure 2A only? There is no need to show the same figure two times. The threshold of both False Discovery Rate and Median Enrichment needs to be added. From Figure 2A, the Lysosomal hydrolases (GBA, LIPA, GALC) seems located in statistically insignificant region. Based on previous studies, the GBA could have an effect on sphingolipid levels, then how to explain that sphingomyelin was highly correlated with ProteoSate staining?

      We have combined the two volcano plots into a single figure (now Figure 1D), and added a line to help visualize the gene effects while considering the combined contribution of FDR and enrichment. Individual lysosomal hydrolases indeed have insignificant effects on ProteoStat and this is discussed in the main text as having relatively constrained impacts on the general lipidome. For example, while GBA and GALC KDs can lead to accumulation of their immediate substrates (glucosylceramide and galactosylceramide, respectively), they do not directly impinge on sphingomyelin.

      (3) The authors show the corelation between ProteoState staining and different lipids/lipid classes in Figure 3B and Figure S3A. It is not necessary to show the corelation with individual lipids (such as sphingomyelin(d18:1/24:0) and cholesterol ester(18:2). The corelation with full collection of lipid classes would be more representative, which is only list in Figure 3B and Figure S3A. It is suggested to add the information of how many individual lipids in each chass are used for the correlation analysis. Replace Figure 3A to Figure S3A, and put Figure 3A as supplementary figure are suggested.

      We decided to retain the correlation of two individual lipids (a sphingomyelin and a cholesterol ester species) with ProteoStat as examples to illustrate with clarity how we obtained the class-wide comparison. The number of individual lipids included in each class for correlation analysis is now included in Figures 2F and S3A.

      (4) The authors state that lipid uptake and metabolism modulate proteostasis. However, only cholesterol and LDL were tested. It would be more precise to state as cholesterol uptake and metabolism modulate proteostasis. In addition, sphingolipids and cholesterol esters accumulate with increased lysosomal protein aggregation. It would be interesting to see the effects of sphingolipids uptake, since sphingolipids are correlated with proteostasis better than cholesterol.

      We attempted to add back specific sphingolipids to assess sufficiency. However, we found it challenging to ensure that these lipids were distributed to the correct subcellular locations at physiologically relevant levels. Without this crucial information, it was difficult to draw any conclusions about the sufficiency of the sphingolipids we tested to impair proteostasis.

      Alonso A, Goñi FM. 2018. The Physical Properties of Ceramides in Membranes. Annu Rev Biophys 47:633–654. doi:10.1146/annurev-biophys-070317-033309

      Ikonen E, Zhou X. 2021. Cholesterol transport between cellular membranes: A balancing act between interconnected lipid fluxes. Dev Cell 56:1430–1436. doi:10.1016/j.devcel.2021.04.025

      Niekamp P, Scharte F, Sokoya T, Vittadello L, Kim Y, Deng Y, Südhoff E, Hilderink A, Imlau M, Clarke CJ, Hensel M, Burd CG, Holthuis JCM. 2022. Ca2+-activated sphingomyelin scrambling and turnover mediate ESCRT-independent lysosomal repair. Nat Commun 13:1875. doi:10.1038/s41467-022-29481-4

    1. Author response:

      We thank the editors and reviewers for their thorough evaluation of our manuscript. We appreciate the constructive feedback and insights provided. 

      We acknowledge that some of our conclusions would benefit from more measured statements and additional computational controls. We will revise the manuscript to better reflect the scope and limitations of our analytical approach. While we cannot add new experimental validations at this stage, we will strengthen our computational analyses and clarify our methodology.

      Below, we outline our planned revisions to address the major points raised in the public reviews:

      Clarification of Terms and Definitions:

      (1) We will make it clearer in our manuscript to emphasize that we reuse the same raw datasets from our previous study as described in Calendrilli et al, 2023, and there is no modification to the experimental methods or data. 

      (2) We will provide clear definitions for:

      - "Non-differentially expressed" genes

      - "Ctrl specific" RNA sets

      - The composition of control populations in different analyses

      (3) We will revise the use of "non-diffusive RNA-chromatin interactome" and “RNase-resistant” terminology to better reflect our actual findings.

      (4) We will also improve clarity regarding:

      - The rationale for focusing on specific genomic regions

      - The interpretation of evolutionary conservation data

      (5) We will provide additional rationale on the exclusion of short-range interactions.

      Figure Revisions:

      (1) Figure 3a: We will correct any discrepancy between text references and figure content.

      (2) Figure 4: We will standardize the color scheme between control and RNase-treated samples.

      (3) We will follow the reviewer's suggestion to move figure 1g to the supplementary file. 

      Additional Computational Analyses:

      (1) We will consider adding controls for RNA length effects and integrate any existing knowledge on the protection extent variation across different RBP.

      Discussions:

      (1) We will carefully rephrase our conclusions to more accurately reflect the scope and limitations of our computational findings, ensuring we do not overstate the implications.

      (2) We will expand the discussion of limitations, including:

      - The focus on RNase-resistant interactions only

      - The cell-type specificity of our findings

      - The lack of functional validation

      - The limited ability to discern and study the transient or weak RNA-chromatin interactions using the current dataset

      (3) Regarding the recent papers from Jenner and Davidovich groups about RNase treatment effects on chromatin solubility:

      - We will discuss these findings in our revised manuscript

      - We will address potential limitations this may impose on our interpretations

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work examines the binding of several phosphonate compounds to a membrane-bound pyrophosphatase using several different approaches, including crystallography, electron paramagnetic resonance spectroscopy, and functional measurements of ion pumping and pyrophosphatase activity. The work attempts to synthesize these different approaches into a model of inhibition by phosphonates in which the two subunits of the functional dimer interact differently with the phosphonate.

      Strengths:

      This study integrates a variety of approaches, including structural biology, spectroscopic measurements of protein dynamics, and functional measurements. Overall, data analysis was thoughtful, with careful analysis of the substrate binding sites (for example calculation of POLDOR omit maps).

      Weaknesses:

      Unfortunately, the protein did not crystallize with the more potent phosphonate inhibitors. Instead, structures were solved with two compounds with weak inhibitory constants >200 micromolar, which limits the molecular insight into compounds that could possibly be developed into small molecule inhibitors. Likewise, the authors choose to focus the spectroscopy experiments on these weaker binders, missing an opportunity to provide insight into the interaction between more potent binders and the protein.

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We attempted co-crystallization with all available inhibitors, including those with higher potency. However, despite numerous efforts, these potent inhibitors yielded low-resolution crystals, making them unsuitable for detailed structural analysis. Therefore, we chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy with the added advantage of accurately analyzing the data against structural models derived from X-ray crystallography. Using these weaker inhibitors enabled a more precise interpretation of the DEER data, thus providing reliable insights into the conformational dynamics and inhibition mechanism. However, as suggested by the reviewer, in the revised version, we will perform DEER analysis on the more potent inhibitors to provide additional insight into their interactions.

      In general, the manuscript falls short of providing any major new insight into membrane-bound pyrophosphatases, which are a very well-studied system. Subtle changes in the structures and ensemble distance distributions suggest that the molecular conformations might change a little bit under different conditions, but this isn't a very surprising outcome. It's not clear whether these changes are functionally important, or just part of the normal experimental/protein ensemble variation.

      We respectfully disagree with the reviewer. The scale of motions seen in this study correspond to those seen in the full panoply of crystal structures of mPPases. Some proteins undergo very large conformational changes during catalysis – such as the rotary ATPase. This one doesn’t, meaning that the precise motions we describe are likely to be relevant. Conformational changes in the ensemble, whether large or small, represent essential protein motions which underlie key mPPase catalytic function. Our DEER spectroscopy data demonstrate the sensitivity and resolution necessary to monitor these subtle changes in equilibria, even if these are only a few Angstroms. For several of the conditions we investigated by DEER in solution, corresponding x-ray structures have been solved, with the derived distances agreeing well with the DEER distributions. This further validates the biological relevance of the structures, including serial time-resolved ones that indicate asymmetry.

      The ZLD-bound crystal structure doesn't predict the DEER distances, and the conformation of Na+ binding site sidechains in the ZLD structure doesn't predict whether sodium currents occur. This might suggest that the ZLD structure captures a conformation that does not recapitulate what is happening in solution/ a membrane.

      We agree with the reviewer that the ZLD-bound crystal structure does not predict the DEER distances. However, we believe this discrepancy arises from the effect of the bulkiness of ZLD inhibitor, which prevents the closure of the hydrolytic centre. Additionally, the absence of Na+ at the ion gate in the ZLD-bound structure suggests that Na+ transport does not occur, a conclusion further supported by our electrometric measurements. We agree with the reviewer, that the distances observed in the DEER experiments might represent a potential new conformation in solution, which may not be captured by the static X-ray structure, thereby offering insights into the dynamic nature of the protein under physiological conditions. Finally, the static x-ray structures have not captured the asymmetric conformations that must exist to explain half-of-the-sites reactivity.

      Reviewer #2 (Public review):

      Summary:

      Crystallographic analysis revealed the asymmetric conformation of the dimer in the inhibitor-bound state. Based on this result, which is consistent with previous time-resolved analysis, authors verified the dynamics and distance between spin introduced label by DEER spectroscopy in solution and predicted possible patterns of asymmetric dimer.

      Strengths:

      Crystal structures with inhibitor bound provide detailed coordination in the binding pocket thus useful information for the PPase field and maybe for drug development.

      Weaknesses:

      The distance information measured by DEER is advantageous for verifying the dynamics and structure of membrane protein in solution. However, regarding T211 data, which, as the authors themselves stated, lacks measurement precision, it is unclear for readers how confident one can judge the conclusion leading from these data for the cytoplasmic side.

      We thank the reviewer for acknowledging the advantageous use of the DEER methodology for identifying dynamic states of membrane proteins in solution. We used two sites in our analysis: S525 (periplasm) and T211 (cytoplasm). As we clearly stated in the original manuscript, S525R1 yielded high-quality DEER data, while T211R1 yielded weak (or no) visual oscillations, leading to broad, though different distributions for the several conditions tested. Our main conclusions are based on the S525R1 data. We included the T211R1 data because, although it does not provide definitive evidence, it is consistent with our proposed model and offers additional insights into biologically relevant conditions. Furthermore, the shifts in the centre of mass (Fig EV8D) of the broad T211R1 distributions show a trend that is consistent with our model; although not proving it, it does not exclude it either. Lastly, these data do indeed confirm an important structural feature of mPPase in solution conditions which is the intrinsically high dynamic state of the loop5-6 where T211 is located, and consistent with our previous (Kellosalo et al., Science,  2012; Li et al., Nat. Commun, 2016; Vidilaseris et al., Sci. Adv., 2019; Strauss et al., EMBO Rep., 2024) and current x-ray crystallography data.

      The distance information for the luminal site, which the authors claim is more accurate, does not indicate either the possibility or the basis for why it is the ensemble of two components and not simply a structure with a shorter distance than the crystal structure.

      We thank the reviewer for pointing out this possibility and alternative interpretation of our DEER data. In the revised version, we will show that our DEER data are consistent with (and do not exclude) asymmetry and rephrase to be inclusive of other possibilities. Importantly, this additional possibility does not affect the current interpretation of the data in our manuscript.

      Reviewer #3 (Public review):

      Summary:

      Membrane-bound pyrophosphatases (mPPases) are homodimeric proteins that hydrolyze pyrophosphate and pump H+/Na+ across membranes. They are attractive drug targets against protist pathogens. Non-hydrolysable PPi analogue bisphosphonates such as risedronate (RSD) and pamidronate (PMD) serve as primary drugs currently used. Bisphosphonates have a P-C-P bond, with its central carbon can accommodate up to two substituents, allowing a large compound variability. Here the authors solved two TmPPase structures in complex with the bisphosphonates etidronate (ETD) and zoledronate (ZLD) and monitored their conformational ensemble using DEER spectroscopy in solution. These results reveal the inhibition mechanism of these compounds, which is crucial for developing future small molecule inhibitors.

      Strengths:

      The authors show that seven different bisphosphonates can inhibit TmPPase with IC50 values in the micromolar range. Branched aliphatic and aromatic modifications showed weaker inhibition.

      High-resolution structures for TmPPase with ETD (3.2 Å) and ZLD (3.3 Å) are determined. These structures reveal the binding mode and shed light on the inhibition mechanism. The nature of modification on the bisphosphonate alters the conformation of the binding pocket.

      The conformational heterogeneity is further investigated using DEER spectroscopy under several conditions.

      Weaknesses:

      The authors observed asymmetry in the TmPPase-ELD structure above the hydrolytic center. The structural asymmetry arises due to differences in the orientation of ETD within each monomer at the active site. As a result, loop5-6 of the two monomers is oriented differently, resulting in the observed asymmetry. The authors attempt to further establish this asymmetry using DEER spectroscopy experiments. However, the (over)interpretation of these data leads to more confusion than any further understanding. DEER data suggest that the asymmetry observed in the TmPPase-ELD structure in this region might be funneled from the broad conformational space under the crystallization conditions.

      See also the response below - We respectfully disagree with the reviewer. The asymmetry was previously established using serial time crystallography (Strauss et al., EMBO Rep, 2024) and biochemical assays (e.g. Malinen et al., Prot. Sci., 2022; Artukka et al., Biochem J, 2018; Luoto et al., PNAS, 2013) and also partially seen in one static structure (Vidilaseris et al., Sci Adv 2019). DEER data only show that the previously proposed asymmetry could also be present within the conformational ensemble in solution conditions. Indeed, our data do not (and cannot) exclude this possibility.

      DEER data for position T211R1 at the enzyme entrance reveal a highly flexible conformation of loop5-6 (and do not provide any direct evidence for asymmetry, Figure EV8).

      Please see relevant response above. We acknowledge that T211 is indeed situated on a highly dynamic loop, which is important for gating and our DEER data confirm its high flexibility. Given we have not observed oscillations of this site, leading to broad distributions, we have stated in the original manuscript that we will not establish the presence of any asymmetry in solution on the basis of T211, rather relying on the S525 site, for which we have acquired high-quality DEER data, as was also pointed out and have been commented on by all reviewers.

      Similarly, data for position S521R1 near the exit channel do not directly support the proposed asymmetry for ETD.

      The reviewer appears to suggest that we hold the S525R1 DEER data as direct proof of asymmetry; this is combative on the grounds that to directly prove asymmetry would require time-resolved DEER measurements, far beyond the scope of this work. Rather, we have applied DEER measurements to explore whether asymmetry (observed previously via time-resolved X-ray crystallography) is also present (or indeed a possibility) in solution. We simply state that the DEER data are consistent with asymmetry (i.e., that the mean distance increases in the presence of ETD compared to the apo-state). This is a restrained interpretation of the data.

      Despite the high quality of the data, they reveal a very similar distance distribution. The reported changes in distances are very small (+/- 0.3 nm), which can be accommodated by a change of spin label rotamer distribution alone. Further, these spin labels are located on a flexible loop, thereby making it difficult to directly relate any distance changes to the global conformation

      We thank the reviewer for recognising the high quality of our DEER data for the S525R1, where visual oscillations in the raw traces, as in our case, reportedly lead to highly accurate and reliable distributions, able to separate (in fortuitous cases) helical movements of only a few Angstroms. The ability of DEER/PELDOR offering near Angstrom resolution was previously demonstrated by the acquisition and solution of high resolution multi-subunit spin-labelled membrane protein structures (Pliotas at al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015; Pliotas, Methods Enzymol, 2017) as well as it ability in detecting small (and of similar to mPPase magnitude) conformational changes in different integral membrane proteins systems (Kapsalis et al., Nature Comms, 2019; Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Lane et al., Structure, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024), occurring under different conditions and/or stimuli in solution and/or lipid environment. The changes here are not very small (e.g. ~ 7 Angstroms between the two mean distance extremes (Ca vs IDP)) for DEER’s proven detection sensitivity, and with all other conditions showing changes between those extremes.

      These changes are relatively small, but they are expected for membrane ion pumps. Indeed, none of the mPPase structures show helical movements of greater than a half a turn, and that only in helices 6 and 12. There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg2+ ions that coordinate the leaving group phosphate. (This is, inter alia, the reason that this loop is so flexible: it can not order before substrate is bound.) Here we have the resolution to detect such subtle differences by DEER, given there are clear shifts in our time domain data and these are reflected in the mean distances in the distributions. Therefore, our study demonstrates the sensitivity and resolution DEER offers in detecting subtle conformational transitions, key in membrane proteins pathways. To further belabour this point, we do not quantify the DEER data (for instance through parametric fitting) to extract populations of different conformational states and we appreciate that to do so would be highly prone to error; however we do (and can, we feel without overinterpretation) assert that the mean distances shift.

      The interpretations listed below are not supported by the data presented:

      (1) 'In the presence of Ca2+, the distance distribution shifts towards shorter distances, suggesting that the two monomers come closer at the periplasmic side, and consistent with the predicted distances derived from the TmPPase:Ca structure.' Problem: This is a far-stretched interpretation of a tiny change, which is not reliable for the reasons described in the paragraph above.

      While the authors overall agree with the reviewer assessment that ±0.3 nm is a small (not a minor) change, there are literature examples quantifying (or using for quantification) distribution peaks separated by similar Δr. (Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024). In particular, none of the mPPase structures show helical movements of greater than a half a turn (in helices 6 and 12 in particular). There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg2+ ions that coordinate the leaving group phosphate. (This is, inter alia, the reason that this loop is so flexible: it can not order before substrate is bound.)

      Importantly, we have fitted Gaussians to the experimental distance distributions of 525R1 output by the Comparative Deer Analyzer 2.0 and observed a change in the distribution width in presence of Ca2+, implying the rotameric freedom of the spin label is restricted. However, the CW-EPR for 525R1 indicate that the rotational correlation time of the spin label is highly consistent between conditions (the spectra are almost identical); this cannot be explained simply by rotameric preference of the spin label (as asserted by the reviewer 3), as there is no (further) immobilisation observed from the CW-EPR of apo-state (Figure EV9) to that in presence of Ca2+. Furthermore, in the absence of conformational changes, it is reasonable to assume (and demonstrable from the CW-EPR data) that the rotamer cloud should not significantly change between conditions. However, Gaussian fits of the two extreme cases yielding the longest (i.e., in presence of IDP) and shortest (in presence of ZTD) mean distances for the 525R1 DEER data indicated significant (i.e., above the noise floor after Tikhonov validation) probability density for the IDP condition at 50 Å (P(r) = 0.18). This occurs at four standard deviations above the mean of the ZTD condition, which by random chance should occur with <0.007% probability. Indeed, one can say that to observe 18% probability density at four standard deviations above the mean by random chance would occur on the order of one in 4 x 10^6.

      As in previous response the method can detect changes of such magnitude which are not small, but physiologically relevant and expected for integral membrane proteins, such as mPPases. Indeed, even in equal (or more) complex systems such as heptameric mechanosensitive channel proteins DEER provided sub-Angstrom accuracy, when a spin labelled high resolution XRC structure was solved (Pliotas et al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015). Despite this is ideal case where DEER accuracy was experimentally validated another high resolution structural method on modified membrane protein and is not very common it demonstrates the power of the method , especially when strong oscillations are present in the raw DEER data (as here for mPPase 525R1), even when multiple distances are present, Angstrom resolution is achievable in such challenging protein classes.

      (2) 'Based on the DEER data on the IDP-bound TmPPase, we observed significant deviations between the experimental and the in silico distances derived from the TmPPase:IDP X-ray structure for both cytoplasmic- (T211R1) and periplasmic-end (S525R1) sites (Figure 4D and Figure EV8D). This deviation could be explained by the dimer adopting an asymmetric conformation under the physiological conditions used for DEER, with one monomer in a closed state and the other in an open state.'

      Problem: The authors are trying to establish asymmetry using the DEER data. Unfortunately, no significant difference is observed (between simulation and experiment) for position 525 as the authors claim (Figure 4D bottom panel). The observed difference for position 112 must be accounted for by the flexibility and the data provide no direct evidence for any asymmetry.

      Reviewer 3 is wrong in suggesting that we are trying to prove asymmetry through the DEER data. That is a well-known fact in the literature (eg Vidilaseris et al, Sci Adv 2019 where we show (1) that the exit channel inhibitor ATC (i.e., close to 525) binds better in solution to the TmPPase:PPi complex than the TmPPase:PPi2 complex, and (2) that ATC binds in an asymmetric fashion to the TmPPase:IDP2 complex with just one ATC dimer on one of the exit channels. We merely use the DEER data to support this well-established fact.

      However, we agree that the DEER data in presence of IDP does not provide direct proof for asymmetry; particularly mutant T211R1 yields in silico distributions too short for measurement by DEER. It is possible that the deviations observed (and particularly likely for T211R1) arise from conformational heterogeneity in solution. We will rephrase this paragraph accordingly: “Owing to the broad nature of the T211R1 (cytoplasmic site) distance distributions, we refrain from interpreting shifts in this data. For the 525R1 (periplasmic site) for which we obtained data of high quality (as also pointed out by both reviewers 2 and 3) we observed deviations between the experimental and the in-silico distances derived from the TmPPase:IDP X-ray structure. While this deviation is less pronounced than for the +ZTD condition, the deviation is consistent with an asymmetric conformation in solution.”

      (3) 'Our new structures, together with DEER distance measurements that monitor the conformational ensemble equilibrium of TmPPase in solution, provide further solid experimental evidence of asymmetry in gating and transitional changes upon substrate/inhibitor binding.'

      Problem: See above. The DEER data do not support any asymmetry.

      We feel that the reviewer comments here are somewhat unfounded. The DEER data (and we will limit discussion only to the 525R1 mutant in this regard) satisfy relevant criteria of the white paper (Schiemann et al., 2021, JACS) from the EPR community (signal-to-noise ratio w.r.t modulation depth of > 20 in all cases; replicates have been performed and will be added into the main-text or supplementary; near quantitative labelling efficiency (evidenced by lack of free spin label signal in the CW-EPR spectra); analysed using the CDA (now Figure EV10, this data we will promote to the main-text) to avoid confirmation bias).

      While the DEER data do not prove asymmetry, we do not claim proof of asymmetry in the above sentence. We concede to rephrase the offending sentence above as: “Our new structures, together with DEER distance measurements that monitor the conformational ensemble of TmPPase in solution, do not exclude asymmetry in gating and transitional changes upon substrate/inhibitor binding and are consistent with our proposed model.” We feel that this reframed conjecture of asymmetry is well founded; indeed, comparing the experimental apo-state 525R1 distance distribution with in-silico modelling performed on the hybridised asymmetric structure (i.e., comprised of one monomer bound to Ca2+ and another bound to IDP) yields an overlap coefficient (Islam and Roux, JPC B, 2015) of >0.97. This implies the envelope of the modelled distance distribution is quantitatively inside the envelope of the experimental distance distribution. Thus, the DEER data do not exclude asymmetry (previously observed by time-resolved XRC) in solution. While we appreciate that ideally one would measure time-resolved DEER to directly correlate kinetics of conformational changes within the ensemble to the catalytic cycle of mPPase,(and this is something we aim to do in the future), it is beyond the the scope of this study.

      Indeed, half-of-the-sites reactivity has been demonstrated in at least the following papers (Vidilaseris et al, Sci Acv. ,2019, Strauss et al, EMBO Rep. 2024, Malinen et al Prot Sci, 2022, Artukka et al Biochem J, 2018; Luoto et al, PNAS, 2013). Half-of-the sites activity requires asymmetry in the mechanism, and therefore asymmetric motions in the active site (viz 211) and exit channel (viz 525). As mentioned above, we have demonstrated this for other inhibitors (Vidilaseris et al 2019) and as part of a time-resolved experiment (Strauss et al 2024). In fact, given the wealth of evidence showing that the symmetrical crystal structures sample a non- or less-productive conformation of the protein, it would be quixotic to propose the DEER experiments - in solution - do not generate asymmetric conformations. It certainly doesn’t obey Occam’s razor of choosing the simplest possible explanation that covers the data.

      (4) Based on these observations, and the DEER data for +IDP, which is consistent with an asymmetric conformation of TmPPase being present in solution, we propose five distinct models of TmPPase (Figure 7).

      Problem: Again, the DEER data do not support any asymmetry and the authors may revisit the proposed models.

      We respectfully disagree with the reviewer. Please see our detailed response above. However, in the revised version, we will clarify that the proposed models are not solely based on the DEER data but are grounded in both current and previously solved structures, with the DEER data providing additional consistency with these models.

      (5) 'In model 2 (Figure 7), one active site is semi-closed, while the other remains open. This is supported by the distance distributions for S525R1 and T211R1 for +Ca/ETD informed by DEER, which agrees with the in silico distance predictions generated by the asymmetric TmPPase:ETD X-ray structure'

      Problem: Neither convincing nor supported by the data

      We respectfully disagree with the reviewer. However, owing to the conformational heterogeneity of T211R1, in the revised version, we will exclude it in the above sentence, to the effect: Please see our detailed response above.

    1. Author Response:

      Thank you for your interest in our paper. We would also like to thank the anonymous reviewers for their critical and constructive comments. Although the reviewers found our work interesting, they raised several important concerns about our study. To address these concerns, mostly we will perform new experiments as following.

      1. Examine whether antioxidant-NAC can block SFN-induced TFEB-nuclear translocation in NPC cells;

      2. Examine whether calcineurin inhibitor (FK506+CsA) or Ca 2+ inhibitor (Bapta-AM) can block SFN-induced TFEB-nuclear translocation in NPC cells.

      3. Investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo tissues.

      4. Investigate whether SFN-evoked the lysosomal exocytosis is TFEB-dependent by using TFEB-KO cells.

      5. Examine the effect of NPC1 deficiency on dextran trafficking by studying the localization of CF- dex and Lamp1.

      6. Perform cytotoxicity experiments to examine whether SFN used in this study is cytotoxic in various cell lines

      In addition, according to the reviewers’ suggestions, we will make clarifications and corrections wherever appropriate in the manuscript. Below please find our point-by-point responses and plans to the reviewers’ comments.

      Reviewer #1 (Public review):

      Summary:

      The authors are trying to determine if SFN treatment results in dephosphorylation of TFEB, subsequent activation of autophagy-related genes, exocytosis of lysosomes, and reduction in lysosomal cholesterol levels in models of NPC disease.

      Strengths:

      (1) Clear evidence that SFN results in translocation of TFEB to the nucleus.

      (2) In vivo data demonstrating that SFN can rescue Purkinje neuron number and weight in NPC1-/- animals.

      Thank you for the support!

      Weaknesses:

      (1) Lack of molecular details regarding how SFN results in dephosphorylation of TFEB leading to activation of the aforementioned pathways. Currently, datasets represent correlations.

      Thank you for this constructive comment. The reviewer is right that in this manuscript the molecular mechanism of SFN-activated TFEB has not been discussed in details. Because previously we have shown that SFN induces TFEB nuclear translocation via a Ca 2+ - dependent but MTOR (mechanistic target of rapamycin kinase)-independent mechanism through a moderate increase in reactive oxygen species (ROS). And calcineurin-mediated TFEB dephosphorylation underlies SFN-induced TFEB activation. These data have been published in 2021 autophagy (Li, Shao et al. 2021) . Therefore, in this study we did not mention this part. We will add the molecular mechanism of TFEB activation by SFN in the discussion part. And to further confirm this mechanism in NPC cells, we will also perform experiments including: 1) examine whether antioxidant-NAC can block SFN-induced TFEB-nuclear translocation in NPC cells; 2) examine whether calcineurin inhibitor (FK506+CsA) can block SFN-induced TFEB-nuclear translocation in NPC cells.

      (2) Based on the manuscript narrative, discussion, and data it is unclear exactly how steady-state cholesterol would change in models of NPC disease following SFN treatment. Yes, there is good evidence that lysosomal flux to (and presumably across) the plasma membrane increases with SFN. However, lysosomal biogenesis genes also seem to be increasing. Given that NPC inhibition, NPC1 knockout, or NPC1 disease mutations are constitutively present and the cell models of NPC disease contain lysosomes (even with SFN) how could a simple increase in lysosomal flux decrease cholesterol levels? It would seem important to quantify the number of lysosomes per cell in each condition to begin to disentangle differences in steady state number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosed.

      Thank you for the suggestion. It is important to define the three states 1) original number of lysosomes, 2) number of new lysosomes, and 3) number of lysosomes being exocytosis. However, we have checked literature, so far it seems that there is no good method that could clearly differentiate the three states of lysosomes.

      (3) Lack of evidence supporting the authors' premise that "SFN could be a good therapeutic candidate for neuropathology in NPC disease".

      Suggestion was taken! We will investigate whether cholesterol was reduced by activation of TFEB by SFN in vivo to strength the point that SFN could be a potential therapeutic compound for NPC treatment. And to avoid confusion, we have removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      This study presents a valuable finding that the activation of TFEB by sulforaphane (SFN) could promote lysosomal exocytosis and biogenesis in NPC, suggesting a potential mechanism by SFN for the removal of cholesterol accumulation, which may contribute to the development of new therapeutic approaches for NPC treatment.

      Strengths:

      The cell-based assays are convincing, utilizing appropriate and validated methodologies to support the conclusion that SFN facilitates the removal of lysosomal cholesterol via TFEB activation.

      Weaknesses:

      (1) The in vivo experiments demonstrate the therapeutic potential of SFN for NPC. A clear dose-response analysis would further strengthen the proposed therapeutic mechanism of SFN. Additional data supporting the activation of TFEB by SFN for cholesterol clearance in vivo would strengthen the overall impact of the study

      We understand the reviewer’s point. We examined two doses of SFN-30 and 50mg/kg. As shown in Fig.6, SFN (50mg/kg), but not 30mg/kg prevents a degree of Purkinje cell loss in the lobule IV/V of cerebellum, suggesting a dose-correlated preventive effect of SFN. In vivo experiments with higher concentrations of SFN and optimized dosage form of SFN were planned in the future study, but will not be included in this study.

      We will investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo.

      (2) In Figure 4, the authors demonstrate increased lysosomal exocytosis and biogenesis by SFN in NPC cells. Including a TFEB-KO/KD in this assay would provide additional validation of whether these effects are TFEB-dependent.

      Thank you for this valuable suggestion. We will investigate whether SFN-evoked the lysosomal exocytosis is TFEB-dependent by using TFEB-KO cells.

      (3) For lysosomal pH measurement, the combination of pHrodo-dex and CF-dex enables ratiometric pH measurement. However, the pKa of pHrodo red-dex (according to Invitrogen) is ~6.8, while lysosomal pH is typically around 4.7. This discrepancy may account for the lack of observed lysosomal pH changes between WT and U18666A-treated cells. Notably, previous studies (PMID: 28742019) have reported an increase in lysosomal pH in U18666A-treated cells.

      We understand the reviewer’s point. But we used pHrodo™ Green-Dextran (P35368, Invitrogen), but not pHrodo red-dex to measure the lysosomal luminal acidity. According to the product information from Invitrogen, pHrodo Green-dex conjugates are non-fluorescent at neural pH, but fluorescence bright green at acidic pH ranges 4-9, such as those in endosomes and lysosomes. Therefore, pHrodo Green-dex can be used to monitor the acidity of lysosome (Hu, Li et al. 2022) . We also used LysoTracker Red DND-99 (Thermo Scientific, L7528) to measure lysosomal pH (Fig. 4G, H), which is consistent with results of pHrodo Green/CF measurement. Overall, in our hands, we have not detected pH change of lysosomes in U18666A-treated NPC1 cell models.

      (4) The authors are also encouraged to perform colocalization studies between CF-dex and a lysosomal marker, as some researchers may be concerned that NPC1 deficiency could reduce or block the trafficking of dextran along endocytosis.

      Suggestion was taken! We will examine the effect of NPC1 deficiency on dextran trafficking by studying the localization of CF-dex and Lamp1.

      (5) In vivo data supporting the activation of TFEB by SFN for cholesterol clearance would significantly enhance the impact of the study. For example, measuring whole-animal or brain cholesterol levels would provide stronger evidence of SFN's therapeutic potential.

      We really appreciate the reviewer’s suggestions. We will investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo.

      Reviewer #3 (Public review):

      Summary:

      The authors demonstrate that activation of TFEB facilitates cholesterol clearance in cell models of Niemann-Pick type C (NPC). This is done through a variety of approaches including activation of TFEB by sulforaphane (SFN), a naturally occurring small-molecule TFEB agonist. SFN induces TFEB nuclear translocation and promotes lysosomal exocytosis. In an NPC mouse model, SFN dephosphorylates/activates TFEB in the brain and rescues the loss of Purkinje cells.

      Strengths:

      NPC is a severe disease and there is little in the way of treatment. The manuscript points towards some treatment options. However, the title, the title "Small-molecule activation of TFEB Alleviates Niemann-Pick Disease..." is far too strong and should be changed.

      Weaknesses:

      (1) The manuscript is extremely hard to read due to the writing; it needs careful editing for grammar and English.

      We will thoroughly check grammar to improve the manuscript.

      (2) There are a number of important technical issues that need to be addressed.

      We will address the technical issues mentioned in the following.

      (3) The TFEB influence on filipin staining in Figure 1A is somewhat subtle. In the mCherry alone panels there is a transfected cell with no filipin staining and the mCherry-TFEBS211A cells still show some filipin staining.

      We understand the reviewer’s point. We will investigate whether cholesterol is cleared by activation of TFEB by SFN in vivo.

      (4) Figure 1C is impressive for the upregulation of filipin with U18666A treatment. However, SFN is used at 15 microM. This must be hitting multiple pathways. Vauzour et al (PMID: 20166144) use SFN at 10 nM to 1microM. Other manuscripts use it in the low microM range. The authors should repeat at least some key experiments using SFN at a range of concentrations from perhaps 100 nM to 5 microM. The use of 15 microM throughout is an overall concern.

      We understand the reviewer’s point. See RESPONSE #1, previously we have shown that SFN (10–15 μM, 2–9 h) induces robust TFEB nuclear translocation in a dose- and time-dependent manner in HeLa GFP-TFEB stable cells as well as in other human cell lines without cytotoxicity (Li, Shao et al. 2021) . According to previous results, in this study, we chose SFN (15 μM) to examine its effect on cholesterol clearance. We will add the information in the discussion part. In this study, we will perform dose-response TFEB nuclear translocation in NPC model cells as well as cytotoxicity experiments to examine whether the concentrations of SFN used in various cell lines are toxic.

      References:

      Hu, M. Q., P. Li, C. Wang, X. H. Feng, Q. Geng, W. Chen, M. Marthi, W. L. Zhang, C. L. Gao, W. Reid, J. Swanson, W. L. Du, R. Hume and H. X. Xu (2022). "Parkinson's disease-risk protein TMEM175 is a proton-activated proton channel in lysosomes.” Cell 185(13): 2292-+.

      Li, D., R. Shao, N. Wang, N. Zhou, K. Du, J. Shi, Y. Wang, Z. Zhao, X. Ye, X. Zhang and H. Xu (2021). “Sulforaphane Activates a lysosome-dependent transcriptional program to mitigate oxidative stress.” Autophagy 17(4): 872-887.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work from Petazzi et al. aimed at identifying novel factors supporting the differentiation of human hematopoietic progenitors from induced pluripotent stem cells (iPSCs). The authors developed an inducible CRISPR-mediated activation strategy (iCRISPRa) to test the impact of newly identified candidate factors on the generation of hematopoietic progenitors in vitro. They first compared previously published transcriptomic data of iPSCderived hemato-endothelial populations with cells isolated ex vivo from the aorta-gonadmesonephros (AGM) region of the human embryo and they identified 9 transcription factors expressed in the aortic hemogenic endothelium that were poorly expressed in the in vitro differentiated cells. They then tested the activation of these candidate factors in an iPSCbased culture system supporting the differentiation of hematopoietic progenitors in vitro. They found that the IGF binding protein 2 (IGFBP2) was the most upregulated gene in arterial endothelium after activation and they demonstrated that IGFBP2 promotes the generation of functional hematopoietic progenitors in vitro.

      Strengths:

      The authors developed an extremely useful doxycycline-inducible system to activate the expression of specific candidate genes in human iPSC. This approach allows us to simultaneously test the impact of 9 different transcription factors on in vitro differentiation of hematopoietic cells, and the system appears to be very versatile and applicable to a broad variety of studies.

      The system was extensively validated for the expression of 1 transcription factor (RUNX1) in both HeLa cells and human iPSC, and a detailed characterization of this test experiment was provided.

      The authors exhaustively demonstrated the role of IGFBP2 in promoting the generation of functional hematopoietic progenitors in vitro from iPSCs. Even though the use of IGFBP2interacting proteins IGF1 and IGF2 have been previously reported in human iPSC-derived hematopoietic differentiation in vitro (Ditadi and Sturgeon, Methods 2016; Ng et al., Nature Biotechnology 2016), and IGFBP-2 itself has been shown to promote adult HSC expansion ex vivo (Zhang et al., Blood 2008), its role on supporting in vitro hematopoiesis was demonstrated here for the first time.

      Weaknesses:

      Although the authors performed a very thorough characterization of the system in proof-ofprinciple experiments activating a single transcription factor, the data provided when 9 independent factors were used is not sufficient to fully validate the experimental strategy. Indeed, in the current version of the manuscript, it is not clear whether the results presented in both the scRNAseq analysis and the functional assays are the consequence of the simultaneous activation of all 9 TF or just a subset of them. This is essential to establish whether all the proposed factors play a role during embryonic hematopoiesis, and a more complete analysis of the scRNAseq dataset could help clarify this aspect.

      Similarly, the data presented in the manuscript are not sufficient to clarify at what stage of the endothelial-to-hematopoietic transition (EHT) the TF activation has an impact. Indeed, even though the overall increase of functional hematopoietic progenitors is fully demonstrated, the assays proposed in the manuscript do not clarify whether this is due to a specific effect at the endothelial level or to an increased proliferation rate of the generated hematopoietic progenitors. Similar conclusions can be applied to the functional validation of IGFBP2 in vitro.

      The overall conclusions are sometimes vague and not always supported by the data. For instance, the authors state that the CRISPR activation strategy resulted in transcriptional remodeling and a steer in cell identity, but they do not specify which cell types are involved and at what level of the EHT process this is happening. In the discussion, the authors also claim that they provided evidence to support that RUNX1T1 could regulate IGFBP2 expression. However, this is exclusively based on the enrichment of RUNX1T1 gRNA in cells expressing higher levels of IGFBP2 and it does not demonstrate any direct or indirect association of the two factors.

      We thank the reviewer for the positive comments about the importance of our work and have now addressed the points raised as weaknesses by performing additional analysis and experiments, adding a new schematic of the mechanism, and rewording our claims.

      We have clarified the different effects mediated by the activation and the IGFBP2 addition in a summary section at the end of the results and added Figure 6, showing this in visual form. We have also clearly stated the limitations related to the correlation between RUNX1T1 and IGFBP2 in the discussion and toned down our claims regarding this throughout the entire paper. We have also reworded the text to clarify the specific cell types identified in the sequencing data that we refer to.

      Reviewer #2 (Public Review):

      To enable robust production of hematopoietic progenitors in-vitro, Petazzi et al examined the role of transcription factors in the arterial hemogenic endothelium. They use IGFBP2 as a candidate gene to increase the directed differentiation of iPSCs into hematopoietic progenitors. They have established a novel induced-CRISPR mediated activation strategy to drive the expression of multiple endogenous transcription factors and show enhanced production of hematopoietic progenitors through expansion of the arterial endothelial cells. Further, upregulation of IGFBP2 in the arterial cells facilitates the metabolic switch from glycolysis to oxidative phosphorylation, inducing hematopoietic differentiation. While the overall study and resources generated are good, assertions in the manuscript are not entirely supported by the experimental data and some claims need further experimental validation.

      We thank the reviewer for the positive comments, and we have provided new data and analysis to make sure that all our assertations are clearly supported and also reworded those where limitations were identified by the reviewers.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The assessment could change from "incomplete" to "solid" if the authors: i) improve data analysis (for both scRNAseq and functional assays) by providing additional information that could strengthen their conclusions, as suggested in the specific comments by both reviewers; ii) either provide new functional evidence supporting their mechanistic conclusion or alternatively tone down the claims that are not fully supported by data and acknowledge the limitations raised by reviewers in the discussion; (iii) the issue of paracrine signaling to expand only hematopoietic progenitors needs to be addressed.

      We have now improved the data analysis and provided additional functional tests to strengthen our conclusions and toned down those that were identified by the reviewers as not supported enough and included a discussion on these limitations. We have also reworded the section about the paracrine signaling throughout the paper.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 contains exclusively published data. It might be more appropriate to use it as a supplementary figure or as part of a more exhaustive figure (maybe combining Figures 1 and 2 together?).

      Figure 1 contained novel bioinformatic analyses that represent the base of our research and it has a different content and focus to figure 2, which is already a large figure. We therefore believe it is better to keep it as a separate figure, containing a new panel now too. 

      It seems there is an issue with Figure S3 labelling:

      • In line 112, Figure S2A-B does not display genomic PCR and sequencing results;

      • In line 123, Figure S3D-E does not show viability and proliferation data;

      • In line 127, Figure S3G does not show mCherry expression in response to DOX;

      We apologies for the confusion with the numbers, we have now correctly labelled the figures.

      It would be more informative to include gates and frequency on flow cytometry plots in Figure S3, to be able to evaluate the extent of the reduction in mCherry expression.

      We have now included the gating and frequency of mCherry-expressing cells in Supplementary Figure 3D.

      It is not clear from the text and figures whether the SB treatment was maintained throughout the hematopoietic differentiation protocol (line 122):

      • If so, it would be important to confirm that HDAC treatment does not affect EHT cultures

      • If not, can the authors provide some evidence that transgene silencing is not occurring during hematopoietic differentiation?

      We have clarified that we decided to treat the cells with SB exclusively in maintenance condihons because HDACs have been shown to be essenhal for the EHT (lines 138-142). We have now also included addihonal data showing the high expression of the mCherry tag reporhng the iSAM expression on day 8 (Supplementary Figure 4F).

      Can the authors provide a simple diagram summarizing the experimental strategy for each differentiation experiment in the respective supplementary figure? For instance, at what stage of the protocol was DOX added in Figure 3? Or at what stage IGFBP2 was added in Figure 5? It would be a very useful addition to the interpretation of the results.

      We have now included three schemahcs for all the experiments in the manuscript in supplementary figure 4 A-C.

      In Figure 3, the authors should provide more detailed information about the data filtering of the scRNAseq experiment, and more specifically:

      • How many cells were included in the analysis for each library after QC and filtering?

      • How "cells in which the gRNAs expression was detected" were selected? Do they include only cells showing expression of gRNAs for all 9 TF?

      This informahon is now included in the method sechon lines 773-781; the detailed code is available on the GitHub link provided in the same sechon. We have filtered the cells expressing one gRNA for the non-targehng gRNA (iSAM_NT) control and more than one for the iSAM_AGM sample. 

      In Figure 3A, it is not clear whether the expression of the 9 factors is consistently detected in all cells or just a subset of them, and the heatmap in Figure 3A does not provide this information. It would be more accurate to provide expression on a per-cell basis, for instance, as a violin plot displaying single dots representing each cell. 

      We have now included this violin plot in Supplementary Figure 4G as requested. However, this visualisation is difficult to interpret because some of the target genes’ expression seems variable in both experimental and control conditions. We had envisaged that this could have been the case and so this is why we had included the three different controls.  For this reason we chose to show the normalised expression which takes all the different variables into account (Figure 3A). 

      In Figure 3B-C, it seems that clusters EHT1 and EHT2 do not express endothelial markers anymore. Are these fully differentiated hematopoietic cells rather than cells undergoing EHT? In general, it would be quite important to provide evidence of expressed marker genes characterizing each cluster (eg. heatmap summarizing top DEG in the supplementary figure?). 

      We have now provided a spreadsheet containing the clusters’ markers that we used in

      Supplementary Table 1) a heatmap in Figure 3E. Furthermor,e we have now edited Figure 3C to include Pan Endothelial markers (PECAM1 and CDH5). These data show that the EHT1 and EHT2 cluster both express endothelial markers but are progressively downregulated as expected during endothelial to hematopoietic transition. We have also included and discussed this in the manuscript lines 192-195 and a schematic for the mechanism in Figure 6.

      In Figure 3E, displaying the proportion of clusters within each sample/library would be a more accurate way of comparing the cell types present in each library (removing potential bias introduced by loading different numbers of cells in each sample).

      We have now included the requested data in Supplementary Figure 4I and it confirms again the expansion of arterial cells in the activated cells.    

      In Figure 3G, by plating 20,000 total CD34+, the assay does not account for potential differences in sample composition. It is then hard to discriminate between the increased number of progenitors in the input or an enhanced ability of HE to undergo EHT. This is an important aspect to consider to precisely identify at what level the activation of the 9 factors is acting. A proper quantification of flow cytometry data summarizing the % of progenitors, arterial cells, etc. would be useful to interpret these results.

      Lines 204-205 reworded. We are very much aware of the fact that the CD34+ cell population consists of a range of cells across the EHT process and this is precisely why we carried out this single cell sequencing analyses.  We purposely tested the effect of the observed changes in composition by colony assays

      In Figure 3G, it seems that NT cells w/o DOX have very little CFU potential (if any). Can the authors provide an explanation for this?

      We think that the limited CFU potential is due to the extensive genetic manipulation and selection that the cells underwent for the derivation of all the iSAM lines but this did not impede us from observing an effect of gene activation on CFU numbers. This is one of the primary reasons that we then validated our overall findings using the parental iPSC line in control condition and with the addition of IGFBP2. We show that the parental iPSC line gives rise to hematopoietic progenitor, both immunophenotypically (Figure 4D) and functionally, at expected levels (Figure 4B left column).

      Figure 4A shows an upregulation of IGFBP2 in arterial cells as a result of TF activation. However, from the data presented here, it is not possible to evaluate whether this is specific to the arterial cluster, or it is a common effect shared by all cell types regardless of their identity. 

      Data has now been included in Supplementary Figure 4H, which shows that all the cells show an increase in IGFBP2, but arterial cells show the highest increase. We have now edited the text to reflect this, in lines 228-230.

      In Figure 5A-B only a minority of arterial cells express RUNX1 in response to IGFBP2 treatment. Is this sufficient to explain the very significant increase in the generation of functional hematopoietic progenitors described in Figure 4? Quantification and statistical analysis of RUNX1 upregulation would strengthen this conclusion.

      We have now provided the statistical analysis showing significant upregulation of RUNX1 upon IGFBP2 addition. The p values are now provided in the figure 5 legend.

      In Figure 5 the authors conclude that IGFBP2 remodels the metabolic profile of endothelial cells. However, it is not clear which cell types and clusters were included in the analysis of Figure 5C-G. Is the switch from Glycolysis to Oxidative Phosphorylation specific to endothelial cells? Or it is a more general effect on the entire culture, including hematopoietic cells? 

      We based this conclusion on the fact that the single-cell RNAseq allows to verify that the metabolic differences are obtained in the endothelial cells. Given that we sorted the adherent cells, the majority of these are endothelial cells as shown in Figure 5A. The Seahorse pipeline includes a number of washing steps resulting in the analyses being performed on the adherent compartment which we know consists primarily of endothelial cells. We cannot exclude some contamination from non-endothelial cells but we highlight to this reviewer that the initial observation of the metabolic changes was identified in endothelial cells in the single cell sequencing data. Taken together, we believe that this implies that metabolic changes are specific to this population. We have clarified this in the line 317.

      In the discussion, the authors conclude that they "provide evidence to support the hypothesis that RUNX1T1 could regulate IGFBP2 expression". To further support this conclusion, the authors could provide a correlation analysis of the expression of the two genes in the cell type of interest. 

      Following the observation of the IGFBP2 high expression across clusters, we have now reworded this sentence in lines 382-385  We have tried to perform the correlation analysis but we believe this not to be appropriate due to the detection level of the gRNA, we have now included this as a limitation point in the discussion lines 416-427, and also toned down the conclusion we did draw about RUNX1T1 throughout the whole manuscript.

      As mentioned by the authors, IGFBP2 binds IGF1 and IGF2 modulating their function. Both IGF1 (http://dx.doi.org/10.1016/j.ymeth.2015.10.001) and IGF2 (doi:10.1038/nbt.3702) have been used in iPSC differentiation into definitive hematopoietic cells. It would be relevant to discuss/reference this in the discussion.

      We have now included the suggested reference in the section where we discuss the role of IGFBP2 in binding IGF1 and IGF2.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1 compares the transcriptome of human AGM and in-vitro derived hemogenic endothelial cells (HECs). It is not clear why only the genes downregulated in the latter were chosen. Are there any significantly upregulated genes, knockdown/knockout which could also serve a similar purpose? Single-cell transcriptome database analysis is very preliminary. A detailed panel with differences in cluster properties of HECs between the two systems should be provided. A heatmap of all differentially expressed genes between the two samples must be generated, along with a logical explanation for choosing the given set of genes. 

      We have now included another panel in figure 1 to better clarify the logic behind the strategy used to identify our target genes (Figure 1A).

      (2) Figure 2 - a panel describing the workflow of gRNA design and targeting for the 9 candidate genes, along with lentiviral packaging and transduction would make it easier to follow. 

      We have now included three schematics for all the experiments in the manuscript in supplementary figure 4 A-C. 

      (3) Figure 3- to assess the effect of arterial cell expansion on the emergence of hematopoietic progenitors, CD34+ Dll4+ cells should be sorted for OP9 co-culture assay.

      Using only CD34+ cells does not answer the question raised. Also, the CFU assay performed does not fully support the claim of enhanced hematopoietic differentiation since only CFU-E and CFU-GM colonies are increased in Dox-treated samples, with no effect on other colony types. OP9 co-culture assay with these cells would be required to strengthen this claim. 

      We wanted to clarify that the effect on the methylcellulose coming from the activated cells was not limited to CFU-E, as the reviewer reported; instead, it also affected CFU-GM and CFU-M. 

      We have now performed additional experiments where we sorted the CD34+ compartment into DLL4- and DLL4+ in Supplementary Figure 5D-E, which we discussed in lines 250-258. 

      (4) In Figure 3F, there appears to be a lot of variation in the DLL4% fold change values for

      DOX treated iSAM_AGM sample, which weakens the claim of increased arterial expansion.

      Can the authors explain the probable reason? It is suggested that the two other controls (iSAM_+DOX and iSAM_-DOX) should be included in this analysis. It is imperative to also show % populations rather than just fold change to gain confidence.

      We agree that there is a lot of variability. That is because differentiation happens in 3D in embryoid bodies, which contain many different cell types that differentiate in different proportions across independent experiments. We have now included the raw data in Supplementary Figure 4 D, with additional statistical analysis to show the expansion of arterial cells including also the suggested additional controls.

      (5) How does activation of these target genes cause increased arterialization? Is the emergence of non-HE populations suppressed? Or is it specific to the HE? The data on this should be clarified and also discussed. ANTO/Lesley text

      We have provided additional data clarifying the connection between increased arterialisation and hemogenic potential. We showed that the activation induces increased arterialisation and that IGFBP2 acts by supporting the acquisition of hemogenic potential. We have discussed this in lines 326-348 and provided a new figure to explain this in detail (figure 6)

      (6) Considering that IGFBP2 was chosen from the activated target gene(s) cluster, can the authors explain why the reduced CFU-M phenomenon observed in Figure 3G does not appear in the MethoCult assay for IGFBP2 treated cells (Figure 4B)?

      The difference could be explained by the fact that in Figure 3G, the cells underwent activation of multiple genes, while in Figure 4B, they were only exposed to IGFBP2. Our results show that IGFBP2 could at least partially explain the phenotype that we see with the activation, but we believe that during the activation experiments, there might be other signals available that might not be induced by IGFBP2 alone. We have also added a summary section and a figure to clarify the different mechanisms of action of the gene activation and IGFBP2.

      (7) Figure 4- while the experiments conducted support the role of IGFBP2 in increasing hematopoietic output, there is no experimental evidence to prove its function through paracrine signalling in HECs. The authors need to provide some evidence of how IGFBP2 supplementation specifically expands only the hematopoietic progenitors. Experimental strategies involving specifically targeting IGFBP2 in hemogenic/arterial endothelial cells are required to prove its cell type specific function. Additionally, assessing the in vivo functional potential of the hematopoietic cells generated in the presence of IGFBP2, by bone-marrow transplantation of CD34+ CD43+ cells, is essential. 

      The role of IGFBP2 in the context of HSC production and expansion was not the topic of our research, and we have not claimed that IGFBP2  affects the long-term repopulating capacity of HSPCs. Therefore, we believe that the requested experiments are not required to support the specific claims that we do make. We have now provided more experiments and bioinformatic analysis that support the role of IGFBP2 in inducing the progression of EHT from arterial cells to hemogenic endothelium, and to avoid misunderstandings, we have toned down our claims by editing the text regarding its paracrine effect s. 

      (8) Figure 4C-D -It is recommended to plot % populations along with fold change value. As this is a key finding, it is important to perform flow cytometry for additional hematopoietic markers- CD144, CD235a and CD41a to demonstrate whether this strategy can also expand erythroid-megakaryocyte progenitors. Telma

      Figure 4C already shows the percentage values; we have now added the percentage for Figure 4D in SF5C. We have also performed additional analysis as requested and added the data obtained to Supplementary Figure 5D.

      (9) In Figure 5, analysis showing the frequency of cells constituting different clusters, between untreated and IGFBP2-treated samples in the single-cell transcriptome analysis is essential. Additional experiments are required to validate the function of IGFBP2 through modulation of metabolic activity. Inhibition of oxidative phosphorylation in the IGFBP2treated cells should reduce the hematopoietic output. Authors should consider doing these experiments to provide a stronger mechanistic insight into IGFBP2-mediated regulation of hematopoietic emergence.

      We have now included the requested cluster composition in Supplementary Figure 5F. We decided not to include further tests on the metabolic profile of IGFBP2 as we already discussed in other papers that showed, using selective inhibitors, that the EHT coincides with a glycol to OxPhos switch. 

      (10) It is very striking to see that IGFBP2 supplementation changes the transcriptional profile of developing hematopoietic cells by increasing transcription of OXPHOS-related genes with concomitant reduction of glycolytic signatures, particularly at Day 13. However, the mitochondrial ATP rate measurements do not seem convincing. The bioenergetic profiles show that when mitochondrial inhibitors are added, both groups exhibit decreased OCR values and, on the other hand, higher ECAR. This indicates that both groups have the capability to utilize OXPHOS or glycolysis and may only differ in their basal respiration rates.

      Differences in proliferation rate can cause basal respiration to change. There is no information on how the bioenergetic profile was normalized (cell no./protein amount). Given that IGFBP2 has been shown to increase proliferation, it is very likely that the cells treated with IGFBP2 proliferated faster and therefore have higher OCR. The data needs to be normalized appropriately to negate this possibility.

      We have previously tested whether IGFBP2 causes an increase in proliferation by analysing the cell cycle of cells treated with it, as we initially thought this could be a mechanism of action. We have now provided the quantification of the cell cycle in the cells treated with IGFBP2, showing no effect was observed in cell cycle Supplementary Figure 4E. Following this analysis, we decided to plate the same number of cells and test their density under the microscope before running the experiment; each experiment was done in triplicate for each condition. We have now added this info to the method sections lines 806-813.  We did not comment on the basal difference, which we agree might be due to several factors, but we only compared the difference in response to the inhibitors, which isn’t affected by the basal level but exclusively by their D values. We have also included the formulas used to calculate the ATP production rate.

      Overall, it appears that IGFBP2 does not seem to primarily cause metabolic changes, but simply accelerates the metabolic dependency on OXPHOS. Hence, the term 'metabolic remodelling' must be avoided unless IGFBP2 depletion/loss of function analysis is shown.

      We thank the reviewer for suggesting how to interpret the data about the dependency on OXPHOS. We have now changed the conclusions and claims about the effect of IGFBP2. We have also included a cell cycle analysis of the hematopoietic cells derived upon IGFBP2 addition to show that they don’t show differences in proliferation that could cause the increase in colony formation we observed. Regarding the assay, we have plated the same number of cells for each group to make sure we were comparing the same number of cells, which we also assessed in the microscope before the test, and we eliminated the suspension cells during the washes that preceded the measurement. The review is correct in indicating that there is a basal difference in the value of OCR and ECAR where the IGFBP2 is lower at the start and not higher, which would not conceal higher proliferation. Finally, the ATP production rate is calculated on the variation of OCR and ECAR upon the addition of inhibitors, which normalizes for the basal differences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      In this manuscript, the molecular mechanism of interaction of daptomycin (DAP) with bacterial membrane phospholipids has been explored by fluorescence and CD spectroscopy, mass spectrometry, and RP-HPLC. The mechanism of binding was found to be a two-step process. A fast reversible step of binding to the surface and a slow irreversible step of membrane insertion. Fluorescence-based titrations were performed and analysed to infer that daptomycin bound simultaneously two molecules of PG with nanomolar affinity in the presence of calcium. Conformational change but not membrane insertion was observed for DAP in the presence of cardiolipin and calcium.

      Strengths:

      The strength of the study is skillful execution of biophysical experiments, especially stoppedflow kinetics that capture the first surface binding event, and careful delineation of the stoichiometry.

      Weaknesses:

      The weakness of the study is that it does not add substantially to the previously known information and fails to provide additional molecular details. The current study provides incremental information on DAP-PG-calcium association but fails to capture the complex in mass spectrometry. The ITC and NMR studies with G3P are inconclusive. There are no structural models presented. Another aspect missing from the study is the reconciliation between PG in the monomer, micellar, and membrane forms.

      Besides the two-stage process, another important finding in the current work is the stable complex that plays a critical role in the drug uptake both in vitro and in B. subtilis. This complex has been shown to be a stable species in HPLC and its binding stoichiometry and affinity have been quantitatively characterized. The complex may not be stable enough in gas phase to be detected in the MS analysis, which was designed to detect the phospholipid and Dap components, not the complex itself. The structural model of this complex is clearly proposed and presented in Figure 6. 

      The NMR and ITC studies have a very clear conclusion that Dap has a weak interaction with the PG headgroup alone, which is unable to account for the Dap-PG interaction observed in the fluorescence studies. Thus, the whole PG molecule has to be involved in the interaction, leading to the discovery of the stable complex.  

      Reviewer #2 (Recommendations For The Authors):

      (1) I appreciate and agree with the comment that there are stages of daptomycin insertion, and these might involve the formation of different complexes with different binding partners (e.g. pre-insertion vs quaternary vs bactericidal). However, it seems like lipid II is an apparent participant in daptomycin membrane dynamics (Grein et al. Nature Communications 2020). It's not clear why this was excluded from analysis by the authors, or what basis there is for the discussion statement that the quaternary complex can shift into the bactericidal complex by exchanging 1 PG for lipid II. 

      We agree that lipid II and other isoprenyl lipids may be involved in the uptake and insertion of daptomycin into membrane according to the results of the Nat. Comm. paper. However, these isoprenyl lipids are very small components of the membrane in comparison to PG and their contribution to the drug uptake is thus expected to be much less significant. Nonetheless, we included farnesyl pyrophosphate (FPP) as an analog of bactoprenol pyrophosphate (C55PP), which was reported to have the same promoting effect as lipid II in the previous study, in our study but found no promoting effect in the fluorescence assay (Fig. 2B). In addition, no complex was formed when FPP replaced PG in our preparation and analysis of the drug-lipid complex. In consideration of these negative results and the expected small contribution, other isoprenyl lipids or their analogs were not included in the study.

      The statement of forming the proposed bactericidal complex from the identified complex is a speculation that is possible only when lipid II has a higher affinity for Dap than a PG ligand. To avoid confusion, we deleted the sentence’ in the revision. 

      (2) The detailed examination of daptomycin dynamics, particularly on the millisecond scale, in this paper is ideal for characterizing the effect of lipid II on daptomycin insertion. It would be helpful to either include lipid II in some analyses (micelle binding, fluorescence shifts, CD) or at least address why it was excluded from the scope of this work.

      As mentioned in the response to the first comment, we did not exclude isoprenyl lipids in our study but used some of their analogs in the fluorescence assay. Besides FPP mentioned above, we also tested geranyl pyrophosphate and geranyl monophosphate but obtained the same negative results. Lipid II was not directly used because it is one of the three isoprenyl lipids reported to have the same promoting effects in the Nat. Comm. paper and also because its preparation is not easy. Even if lipid II were different from other isoprenyl lipids in promoting membrane binding, its contribution is likely negligible at the reversible stage compared to the phospholipids because of its minuscule content in bacterial membrane. This is the main reason we did not use the isoprenyl lipids in the fast kinetic study (this stage only involves reversible binding, not insertion). 

      (3) Grein et al. 2020 saw that PG did not have a strong effect on daptomycin interaction with membranes. I believe this discrepancy is more likely due to the complex physical parameters of supported bilayers versus micelles/vesicles or some other methodological variable, but if the authors have more insight on this, it would be valuable commentary in the discussion.

      We totally agree that the discrepancy is likely due to the different conditions in the assays. It is hard to tell exactly what causes the difference. Thus, we did not attempt to comment on the cause of this difference in the discussion.

      (4) Isolation of the daptomycin complex from B. subtilis cells clearly had different traces from the in vitro complex; is it possible that lipid II is present in the B. subtilis complex? If not, a time-course extraction could be useful to support the model that different complexes have different activities. Isolates from early-stage incubation with daptomycin may lack lipid II but isolates from longer incubations may have lipid II present as the complex shifts from insertion to bactericidal.

      From the day we isolated the complex from B. subtilis, we have been looking for evidence for the previously proposed lipid complexes containing lipid II or other isoprenyl lipids but have not been successful. We did not see any sign of lipid II or other isoprenyl lipids in the MALDI or ESI mass spectroscopic data. The minute peaks in the HPLC traces are not the expected complexes in separate LC-MS analysis. However, this does not mean that such complexes are not present in the isolated PG-containing complex because: (1) the amount of such complexes may be too small to be detected due to the low content of the isoprenyl lipids; (2) the isoprenyl lipids, particularly lipid II, are not easily ionizable due to their size and unique structure for detection in mass spectrometry. 

      We don’t think the drug treatment time is the reason for the failure in detecting lipid II or other isoprenyl lipids. In our reported experiment, the cells were treated with a very high dose of Dap for 2 hours before extraction. In a separate experiment done recently, we treated B. subtilis at 1/3 of the used dose under the same condition and found all treated cells were dead after 1 hour in a titration assay, consistent with the results from reported time-killing assays in the literature. From this result, the proposed bactericidal lipid-containing complex should have been formed in the treated cells used in our extraction and isolated along with the PG-containing complex. It was not detected likely due to the reasons discussed above. To avoid the interference of the PG-containing complex, a large amount of bacterial cells might have to be treated at a low dose to isolate enough amount of the lipid II-containing complex for identification. However, isolation or identification of the lipid II-containing complex is outside the scope of the current investigation and is therefore not pursued. 

      (5) Part of the daptomycin mechanism of interacting with bacterial membranes involves the flipping of daptomycin from one leaflet to another. There was some mentioned work on the consistency of results between micelles and vesicles, but the dynamics or existence of a flipping complex in the bilayer system wasn't addressed at all in this paper.

      The current investigation makes no attempt to solve all problems in the daptomycin mode of action and is limited to the uptake of the drug, up to the point when Dap is inserted into the membrane. Within this scope, flipping of the complex is not yet involved and is thus irrelevant to the study. How the complex is flipped and used to kill the bacteria is what should be investigated next.  

      (6) The authors mention data with phosphatidylethanolamine in the text, but I could not find the data in the main or supplemental figures. I recommend including it in at least one of the figures.

      It is much appreciated that this error is identified. The POPE data was lost when the graphic (Fig. 2B) was assembled in Adobe to create Figure 2. We re-draw the graphic and reassemble the figure to solve this problem. Fig. 2B has also been modified to use micromolar for the concentration of the lipids.

      (7) Readability point: I'd suggest some consistency in the concentrations mentioned. Making the concentrations either all molar-based or all percentage-based would make comparison across figures easier.

      As suggested, we have changed the % into micromolar concentrations in Fig. 2B and also in Fig. 3A. 

      (8) The model figure is quite difficult to interpret, particularly the final stage of the tail unfolding. I recommend the authors use a zoomed-in inset for this stage, or at least simplify the diagram by removing the non-participating lipid structures. The figure legend for the model figure should also have a brief description of the events and what the arrows mean, particularly the POPS PG arrow in the final panel of the figure. I am assuming here the authors are implying that daptomycin can transiently interact with one lipid species and move to another, but the arrow here suggests that daptomycin is moving through the lipid headgroup space.

      We really appreciate the suggestions. As suggested, we put an inset to show the preinsertion complex more clearly. In addition, we have removed the green arrows originally intended to show the re-organization/movement of the phospholipids. Moreover, the legend is changed to ‘Proposed mechanism for the two-phased uptake of Dap into bacterial membrane. In the first phase, Dap reversibly binds to negative phospholipids with a hidden tail in the headgroup region, where it combines with two PG molecules to form a pre-insertion complex. In the second phase, the hidden tail unfolds and irreversibly inserts into the membrane. The inset shows the headgroup of the pre-insertion complex with the broad arrow showing the direction for the unfolding of the hidden tail. The red dots denote Ca2+.’  

      (9) The authors listed the Kd for daptomycin and 2 PG as 7.2 x 10-15 M2. Is this correct? This is an affinity in the femtomolar range.

      Please note that this Kd is for the simultaneous binding of two PG molecules, not for the binding of a single ligand that we usually refer to. Assuming that each PG contributes equally to this interaction, the binding affinity for each ligand is then the squared root of 7.2 x 10-15 M2, which equals to 8.5 x 10-8 M. This is equivalent to a nanomolar affinity for PG and is a reasonably high affinity.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors reported an increase in daptomycin intensity with the increasing amount of negatively charged DMPG. A similar observation has been reported for GUVs, however, the authors did not refer to this paper in their manuscript: E. Krok, M. Stephan, R. Dimova, L. Piatkowski, Tunable biomimetic bacterial membranes from binary and ternary lipid mixtures and their application in antimicrobial testing, Biochim. Biophys. Acta - Biomembr. 1865 (2023) [1]. This paper is also consistent with the authors' observation that there is negligible fluorescence detected for the membranes composed of PC lipids upon exposure to the Dap treatment.

      As suggested, this paper is cited as ref. 29 in the revision by adding the following sentence at the end of the section ‘Dependence of Dap uptake on phosphatidylglycerol.’: ‘PG-dependent increase of the steady-state fluorescence was also observed in giant unilamellar vesicles (GUVs).29’. The numbering is changed accordingly for the remaining references.  

      (2) Please include the plot of the steady-state Kyn fluorescence vs the content of POPA (Figure 2C shows traces for DMPG, CL, and POPS). Both POPA and POPS lipids are negatively charged, however, POPS seems to interact with Dap, while POPA does not. In my opinion, this observation is really interesting and might deserve a more thorough discussion. The authors might want to describe what could be the mechanism behind this lipid-specific mode of binding.

      As suggested, a plot is now added for POPA in Fig. 2C, which is basically a flat line without significant increase for the Kyn fluorescence. Indeed, the different effect of the negative phospholipids is very interesting, indicating that the reversible binding of Dap to the lipid surface is dependent not only on the Ca2+-mediated ionic interaction but also the structure of the headgroup. In other words, Dap recognizes the phospholipids at the surface binding stage. Considering this headgroup specificity, the last sentence in the second paragraph in “Discussion’ is changed from ‘In addition, due to the low lipid specificity, this reversible binding likely involves Ca2+-mediated ionic interaction between Dap and the phosphoryl moiety of the headgroups.’ to ‘In addition, due to the specificity for negative phospholipids (Fig. 2B and 2C), this reversible binding of Dap likely involves both a nonspecific Ca2+-mediated ionic interaction and a specific interaction with the remaining part of the headgroups.’

      (3) The authors write that they propose a novel mechanism for the Ca2+-dependent insertion of Dap to the bacterial membrane, however, they rather ignored the already published findings and hypotheses regarding this process. In fact the role of Ca2+, as well as the proposed conformational changes of Dap, which allow its deeper insertion into the membrane are well known:

      The role of Ca2+ ions in the mechanism of binding is actually three-fold: (i) neutralization of daptomycin charge [2], (iii) creating the connection between lipids and daptomycin and (iii) inducing two daptomycin conformational changes. It should be noted that the interactions between calcium ions and daptomycin are 2-3 orders of magnitude stronger than between daptomycin and PG lipids [3,4]. Thus, upon the addition of CaCl2 to the solution, the divalent cations of calcium bind preferentially to the daptomycin, rather than to the negatively charged PG lipids, which results in the decrease of daptomycin net negative charge but also leads to its first conformational change [4]. Upon binding between calcium ions and two aspartate residues, the area of the hydrophobic surface increases, which allows the daptomycin to interact with the negatively charged membrane. In the next step, Ca2+ acts as a bridge connecting daptomycin with the anionic lipids. This event leads to the second conformational change, which enables deeper insertion of daptomycin into the lipid membrane and enables its fluorescence [4]. The overall mechanism has a sequential character, where the binding of daptomycin-Ca2+ complex to the negatively charged PG (or CA) occurs at the end.

      The authors should focus on emphasizing the novelty of their manuscript, keeping in mind the already published paper.

      We agree with the comments on the three general roles of calcium ion in the Dap interaction with membrane. The current investigation does not ignore the previous findings, which involve many more works than mentioned above, but takes these findings as common knowledge. Actually, the role of calcium ion is not the focus of current work. Instead, the current work focuses on how the drug is taken up and inserted into the membrane in the presence of the ion and how its structure changes in this process. With the known roles of calcium ion in mind, we propose an uptake mechanism (Fig. 6) that shows no conflict with the common knowledge.

      We would like to point out that the ‘deeper insertion into the membrane’ in the comment is different from the membrane insertion referred to in our manuscript. This ‘deeper insertion’ still remains in the reversible stage of binding to the membrane surface because all negative phospholipids can do this (causing a conformational change and fluorescence increase, as quantified in Fig.2C) but now we know that only PG can enable irreversible membrane insertion because of our work. In addition, the comment that calcium binding to daptomycin causes first conformational change is not supported by our finding that no conformational change is found for Dap in the presence of calcium in a lipid-free environment (Fig. 3B). One important aspect of novelty and contribution of our work is to clear up some of these ambiguities in the literature. Another contribution of our work is to demonstrate the formation of a stable complex between Dap and PG with a defined stoichiometry and its crucial role in the drug uptake. 

      (4) One paragraph in the section "Ca2+- dependent interaction between Dap and DMPG" is devoted to a discussion of the formation of precipitate upon extraction of DMPG-containing micelles, exposed to Dap in the calcium-rich environment. Contrary, in the absence of Dap, no precipitate was detected. The authors did not provide any visual proof for their statement. Please include proper photographs in the supplementary information.

      The precipitate formed upon extraction of the DMPG-containing micelles was too little to be visually identifiable but could be collected by centrifugation and detected by fluorescence or HPLC after dissolving in DMSO. For visualization, we show below the precipitate formed using higher amount of Dap and DMPG. The Dap-DMPG-Ca2+ complex (left tube) was formed by mixing 1 mM Dap, 2 mM DMPG and 1 mM Ca2+ and the control (right tube) was a mixture of 2 mM DMPG and 1 mM Ca2+. This is now added as Fig. S7 in the supplementary information (the index is modified accordingly) and cited in the main text.

      (5) The authors wrote that it is not clear how many calcium ions are bound to Dap-2PG complex (page 11, Discussion section). There are already reports discussing this issue. I recommend citing the paper discussing that exactly two Ca2+ ions bind to a single Dap molecule: R. Taylor, K. Butt, B. Scott, T. Zhang, J.K. Muraih, E. Mintzer, S. Taylor, M. Palmer, Two successive calcium-dependent transitions mediate membrane binding and oligomerization of daptomycin and the related antibiotic A54145, Biochim. Biophys. Acta - Biomembr. 1858, (2016) 1999-2005 [5]

      We were aware of the cited work that shows binding of two Ca2+ but also noted that there are more works showing one Ca2+ in the binding, such as the paper in [Ho, S. W., Jung, D., Calhoun, J. R., Lear, J. D., Okon, M., Scott, W. R. P., Hancock, R. E. W., & Straus, S. K. (2008), Effect of divalent cations on the structure of the antibiotic daptomycin. European Biophysics Journal, 37(4), 421–433.]. That was the reason we said ‘it is not clear how many calcium ions are bound to Dap-2PG complex’. Now, both papers are cited (as Ref. #33, 34) to support this statement.

      (6) The authors wrote two contradictory statements:

      -  PG cannot be found in mammalian cell membranes:

      "Moreover, the complete dependence of the membrane insertion on PG also explains why Dap selectively attacks Gram-positive bacteria without affecting mammalian cells, because PG is present only in bacterial membrane but not in mammalian membrane. " (Page 10, Discussion section, last sentence of the first paragraph)

      "However, Dap absorbed on bacterial surface is continuously inserted into the acyl layer via formation of complex with PG in a time scale of minutes, whereas no irreversible insertion of Dap occurs on mammalian membrane due to the absence of PG while the bound Dap is continuously released to the circulation as the drug is depleted by the bacteria." (Page 13, Discussion section)

      -  PG in trace amounts is present in mammalian membranes:

      "The proposed requirement of the pre-insertion quaternary complex increases the threshold of PG content for the membrane insertion to happen and thus makes it impossible on the surface of mammalian cells even if their plasma membrane contains a trace amount of PG." (Page 13, Discussion section).

      In fact, phosphatidylglycerol comprises 1-2 mol% of the mammalian cell membranes. Please, correct this information, which in this form is misleading to the readers.

      We appreciate the comments about the PG content in mammalian cells. Changes are made as listed below:

      (1) p10, the sentence is changed to ‘Moreover, the complete dependence of the membrane insertion on PG also explains why Dap selectively attacks Gram-positive bacteria without affecting mammalian cells, because PG is a major phospholipid in bacterial membrane but is a minor component in mammalian membrane.’ 

      (2) p13, the sentence is changed to ‘However, Dap absorbed on bacterial surface is continuously inserted into the acyl layer via formation of complex with PG in a time scale of minutes, whereas little irreversible insertion of Dap occurs on mammalian membrane due to the low content of PG while the bound Dap is continuously released to the circulation as the drug is depleted by the bacteria.’

      (3) p13, another sentence is modified to ‘The proposed requirement of the pre-insertion quaternary complex increases the threshold of PG content for the membrane insertion to happen and thus makes it less likely on the surface of mammalian cells that contain PG at a low level in the membrane.’ 

      (7) Please include information that Dap is effective only against Gram-positive bacteria and does not show antimicrobial properties against Gram-negative strains. The authors focused on emphasizing that Dap does not affect mammalian membranes, most likely due to the low PG content, however even membranes of Gram-negative bacteria are not susceptible to the Dap, despite the relatively high content of negatively charged PG in the inner membrane (e.g. inner cell membrane of E. coli has ~20% PG).

      The requested information is already included in ‘Introduction’. In this part, Dap is introduced to be only active against Gram-positive bacteria, implicating that it is not active against Gram-negative bacteria. The reason Dap is inactive against E. coli or other Gramnegative bacteria is because the outer membrane prevents the antibiotic from accessing the PG in the inner membrane to cause any harm. When the outer membrane is removed, Dap will also attack the plasma membrane of Gram-negative bacteria. 

      Literature cited in the comments:

      (1) E. Krok, M. Stephan, R. Dimova, L. Piatkowski, Tunable biomimetic bacterial membranes from binary and ternary lipid mixtures and their application in antimicrobial testing, Biochim. Biophys. Acta - Biomembr. 1865 (2023). https://doi.org/10.1101/2023.02.12.528174.

      (2) S.W. Ho, D. Jung, J.R. Calhoun, J.D. Lear, M. Okon, W.R.P. Scott, R.E.W. Hancock, S.K. Straus, Effect of divalent cations on the structure of the antibiotic daptomycin, Eur. Biophys. J. 37 (2008) 421-433. https://doi.org/10.1007/S00249-007-0227-2/METRICS.

      (3) A. Pokorny, P.F. Almeida, The Antibiotic Peptide Daptomycin Functions by Reorganizing the Membrane, J. Membr. Biol. 254 (2021) 97-108. https://doi.org/10.1007/s00232-02100175-0.

      (4) L. Robbel, M.A. Marahiel, Daptomycin, a bacterial lipopeptide synthesized by a nonribosomal machinery, J. Biol. Chem. 285 (2010) 2750127508. https://doi.org/10.1074/JBC.R110.128181.

      (5) R. Taylor, K. Butt, B. Scott, T. Zhang, J.K. Muraih, E. Mintzer, S. Taylor, M. Palmer, Two successive calcium-dependent transitions mediate membrane binding and oligomerization of daptomycin and the related antibiotic A54145, Biochim. Biophys. Acta - Biomembr. 1858 (2016) 1999-2005. https://doi.org/10.1016/J.BBAMEM.2016.05.020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results showed that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. These effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.

      Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes the genetic diversity of the dominant species at each trophic level, biomass production, decomposition rates, and environmental data.

      The conclusions of this paper are mostly well supported by the data and the writing is logical and easy to follow.

      Weaknesses:

      (1) While the dataset is impressive, the authors conducted analyses more akin to a "meta-analysis," leaving out important basic information about the raw data in the manuscript. Given the complexity of the relationships between different trophic levels and ecosystem functions, it would be beneficial for the authors to show the results of each SEM (structural equation model).

      We understand the point raised by the reviewer. We now provide individual SEMs (Figure 3), although we limit causal relationships to those for which the p-value was below 0.2 for the sake of graphical clarity. We also provide the percentage of explained variance for each ecosystem function. We detail the graph in the Results section (see l. 317-328) and discuss them (see l. 387-398). Note that we do not detail each function separately as this would (in our opinion) result in a long descriptive paragraph from which it might be difficult to get some key information. Rather, we summarize the percentage of explained variance for each function and discuss the strength of environmental vs biodiversity effects for some examples. In the Discussion, we explain why environmental effects (on functions and biodiversity) are relatively weak. We mainly attribute this to the sampling scheme that follows an East-West gradient (weak altitudinal range) rather than an upstream-downstream gradient as it is traditionally done in rivers. The reasoning behind this sampling scheme is explained in our companion paper (Fargeot et al. Oikos 2023) to which we now refer more explicitly in the MS. Briefly, using an upstream-downstream gradient would have certainly push up the effects of the environment, but this would have made extremely complex the inference of biodiversity effects due to strong collinearity among environmental and biodiversity parameters.

      (2) The main results presented in the manuscript are derived from a "metadata" analysis of effect sizes. However, the methods used to obtain these effect sizes are not sufficiently clarified. By analyzing the effect sizes of species diversity and genetic diversity on these ecosystem functions, the results showed that species diversity had negative effects, while genetic diversity had positive effects on ecosystem functions. The negative effects of species diversity contradict many studies conducted in biodiversity experiments. The authors argue that their study is more relevant because it is based on a natural system, which is closer to reality, but they also acknowledge that natural systems make it harder to detect underlying mechanisms. Providing more results based on the raw data and offering more explanations of the possible mechanisms in the introduction and discussion might help readers understand why and in what context species diversity could have negative effects.

      (We now provide more details. However, we are unfortunately not sure that this helped reaching some stronger explanation regarding underlying mechanisms. To be frank, we did not succeed in improving mechanistic inferences based on the outputs of the SEM models. We explored visually some additional relationships (e.g. relationships between the biomass of the focal species and that of other species in the assemblage) that we now discuss a bit more, but again, this did not really help in better understanding processes. We realize this is a limitation of our study and that this can be frustrating for readers. Nonetheless, as said in the Discussion, field-based study must be taken for what they are; observational studies forming the basis for future mechanistic studies. Although we failed to explain mechanisms, we still think that we provide important field-base evidence for the importance of biodiversity (as a whole) for ecosystem functions.

      3) Environmental variation was included in the analyses to test if the environment would modulate the effects of biodiversity on ecosystem functions. However, the main results and conclusions did not sufficiently address this aspect.

      This is now addressed, see our response to your first comment. We now explain (result section) and discuss environmental effects. As explained in the MS, environmental effects are similar in strength to those of biodiversity and are not that high, which is partly explained by the sampling scheme (see Fargeot et al. 2023). This is a choice we’ve made at the onset of the experiment, as we wanted to focus on biodiversity effects and avoid strong collinearity as it is generally the case in rivers (which impedes any proper and strong statistical inferences).

      Reviewer #2 (Public review):

      Summary:

      Fargeot et al. investigated the relative importance of genetic and species diversity on ecosystem function and examined whether this relationship varies within or between trophic-level responses. To do so, they conducted a well-designed field survey measuring species diversity at 3 trophic levels (primary producers [trees], primary consumers [macroinvertebrate shredders], and secondary consumers [fishes]), genetic diversity in a dominant species within each of these 3 trophic levels and 7 ecosystem functions across 52 riverine sites in southern France. They show that the effect of genetic and species diversity on ecosystem functions are similar in magnitude, but when examining within-trophic level responses, operate in different directions: genetic diversity having a positive effect and species diversity a negative one. This data adds to growing evidence from manipulated experiments that both species and genetic diversity can impact ecosystem function and builds upon this by showing these effects can be observed in nature.

      Strengths:

      The study design has resulted in a robust dataset to ask questions about the relative importance of genetic and species diversity of ecosystem function across and within trophic levels.

      Overall, their data supports their conclusions - at least within the system that they are studying - but as mentioned below, it is unclear from this study how general these conclusions would be.

      Weaknesses:

      (4) While a robust dataset, the authors only show the data output from the SEM (i.e., effect size for each individual diversity type per trophic level (6) on each ecosystem function (7)), instead of showing much of the individual data. Although the summary SEM results are interesting and informative, I find that a weakness of this approach is that it is unclear how environmental factors (which were included but not discussed in the results) nor levels of diversity were correlated across sites. As species and genetic diversity are often correlated but also can have reciprocal feedbacks on each other (e.g., Vellend 2005), there may be constraints that underpin why the authors observed positive effects of one type of diversity (genetic) when negative effects of the other (species). It may have also been informative to run SEM with links between levels of diversity. By focusing only on the summary of SEM data, the authors may be reducing the strength of their field dataset and ability to draw inferences from multiple questions and understand specific study-system responses.

      We have addressed this remark and we ask the reviewers and the readers to refer to our response to comment 1 from reviewer 1. Regarding co-variation among biodiversity estimates (SGDCs according to Vellend’s framework), we have addressed these issues in a companion paper that we now cite and expand further in the MS (Fargeot et al. Oikos, 2023). Given the size of the dataset and its complexity (and associated analyses), we have decided to focus on patterns of species and genetic biodiversity in a first paper (Oikos paper) and then on the link between biodiversity and functions (this paper). As it can be read in the Oikos’s paper, there are no co-variation in term of biodiversity estimates; species diversity is not correlated to genetic diversity, and within facet, there are not co-variation among species. In addition, environmental predictors are highly estimate-specific (i.e. environmental predictors sustaining species and genetic estimates are idiosyncratic). As a result (see the new Figure 3), environmental effects are relatively weak (the same intensity that those of biodiversity) and collinearity among parameters is relatively weak. The second point is important, as this permit to better infer parameters from models, and this allows to discuss direct relationships (as observed in Figure 3, indirect environmental effects are relatively rare). We provide in the Discussion a bit more explanation about the absence of co-variation among biodiversity estimates (see l. 433-440).

      (5) My understanding of SEM is it gives outputs of the strength/significance of each pathway/relationship and if so, it isn't clear why this wasn't used and instead, confidence intervals of Z scores to determine which individual BEFs were significant. In addition, an inclusion of the 7 SEM pathway outputs would have been useful to include in an appendix.

      We now provide p-values (Table S2) and the seven models (Figure 3).

      (6) I don't fully agree with the authors calling this a meta-analysis as it is this a single study of multiple sites within a single region and a specific time point, and not a collection of multiple studies or ecosystems conducted by multiple authors. Moreso, the authors are using meta-analysis summary metrics to evaluate their data. The authors tend to focus on these patterns as general trends, but as the data is all from this riverine system this study could have benefited from focusing on what was going on in this system to underpin these patterns. I'd argue more data is needed to know whether across sites and ecosystems, species diversity and genetic diversity have opposite effects on ecosystem function within trophic levels.

      We agree. “Meta-regression” would perhaps be more adequate than “meta-analyses”. We changed the formulation.

      Reviewer #3 (Public review):

      The manuscript by Fargeot and colleagues assesses the relative effects of species and genetic diversity on ecosystem functioning. This study is very well written and examines the interesting question of whether within-species or among-species diversity correlates with ecosystem functioning, and whether these effects are consistent across trophic levels. The main findings are that genetic diversity appears to have a stronger positive effect on function than species diversity (which appears negative). These results are interesting and have value.

      However, I do have some concerns that could influence the interpretation.

      (7) Scale: the different measures of diversity and function for the different trophic levels are measured over very different spatial scales, for example, trees along 200 m transects and 15 cm traps. It is not clear whether trees 200 m away are having an effect on small-scale function.

      Trees identification and invertebrate (and fish) sampling are done on the same scale. Trees are spread along the river so that their leaves fall directly in the river. Traps have been installed all along the same transect in various micro-habitats. Diversity have been measured at the exact same scale for all organisms. We have modified the MS to make this clear.

      (8) Size of diversity gradients: More information is needed on the actual diversity gradients. One of the issues with surveys of natural systems is that they are of species that have already gone through selection filters from a regional pool, and theoretically, if the environments are similar, you should get similar sets of species, without monocultures. So, if the species diversity gradients range from say, 6 to 8 species, but genetic diversity gradients span an order of magnitude more, you can explain much more variance with genetic diversity. Related to this, species diversity effects on function are often asymptotic at high diversity and so if you are only sampling at the high diversity range, we should expect a strong effect.

      Fish species number varies from 1 to 11, invertebrate family number varies from 15 to 42 and the tree species number varies from 7 to 20 (see Fargeot et al. 2023 for details). We have added this information in the M&M. The gradients are hence relatively large and do not cover a restricted set of values. There is a variance in species number among sites, even if sites are collected along a relatively weak altitudinal gradient. This is obviously complex to compare to SNP (genomic) diversity. Genetic and species effects are similar in effect sizes (percentage of explained variance), so it does not seem we have biased one of the two gradients of biodiversity.

      (9) Ecosystem functions: The functions are largely biomass estimates (expect decomposition), and I fail to see how the biomass of a single species can be construed as an ecosystem function. Aren't you just estimating a selection effect in this case?

      The biomass estimated for a certain area represents an estimate of productivity, whatever the number of species being considered. Obviously, productivity of a species can be due to environmental constraints; the biomass is expected to be lower at the niche margin (selection effect). But if these environmental effects are taken into account (which is the case in the SEMs), then the residual variation can be explained by biodiversity effects. We provide an explanation (l. 217-219).

      (10) Note that the article claims to be one of the only studies to look at function across trophic levels, but there are several others out there, for example:

      Thanks, we now cite some of these studies (Li et al 2020, Moi et al. 2021, Seibold et al. 2018).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction:

      The introduction of the manuscript is generally well-structured, and the scientific questions are clearly presented. However, in each paragraph where specific aspects are introduced, the authors do not focus sufficiently on the given points. The current introduction discusses the weaknesses of previous studies extensively but lacks detailed explanations of mechanisms and a clear anticipation of this study's contributions.

      For example:

      L72-77: The authors mention that "genetic diversity may functionally compensate for a species loss," but this point is not highly relevant to the main analyses of this study, which focus on comparing the relative effects of species diversity and genetic diversity.

      Yes true, we understand the point made by the reviewers. We deleted this part of the sentence.

      L87-95: As previously noted, "whether environmental variation decreases or enhances the relative influence of genetic and species diversity on ecosystem functions" was not addressed in this study. Additionally, the last sentence seems unnecessary here, as it does not relate to "environmental variation." The phrase "generate insightful knowledge for future mechanistic models" is vague. It would be helpful to specify what kind of knowledge and what types of future mechanistic models are being referred to.

      We modified these two sentences. We now posit the prediction that what has been observed under controlled conditions (that genetic and species have effects of similar magnitude) might not be the norm under fluctuating environments (because it has been shown that environmental variation modulates the strength of interspecific BEFS and create huge variance).

      L96-116: The use of "for instance" three times in this paragraph makes the structure seem scattered, as only examples are provided. Improving the transition words can help the text focus better on the main point.

      We have modified some parts of this section to better reflect predictions

      L115-116: Again, it would be beneficial to specify what kind of insightful information can be provided.

      We have modified this sentence by making more explicit some of the information that may be gained.

      L117-134: Stating clear expectations can help the introduction focus on the mechanisms and assist readers in following the results.

      We now provide some predictions. We were reluctant to make predictions in the first version of the MS as we have the feeling that predictions can go on very different direction depending on how we set the scene. We therefore stick to predictions that we think are the most logical (the simplest ones). This illustrates the lack of theoretical papers on these issues.

      Methods:

      L287-293: The method for estimating the standard effect size is unclear. I assume it was derived from the SEM models? This needs further clarification.

      Yes, it is derived from the standardized estimate from each pSEM. This is now explained in the MS.

      Results:

      As mentioned in the public review, it is very important to show the results of analyzing raw data.

      Done, see Figure 3 and Results section.

      Table 1: The font and format of the PCA table are different from other tables and appear vague, resembling a picture rather than a table.

      Changed.

      Table 2 (and supplementary table): "D.f." is not explained in the table legend. Is 1 the numerator df and 30 the denominator df? Is the denominator the residual? Additionally, the table legend mentions "magnitude and direction." ANOVA only tests if the biodiversity effects are significantly different between species or genetic diversity, but not the magnitude. For example, -0.5 and 0.5 are very different, but their effect magnitudes are the same.

      This is a mistake; sorry the format of the Table was from a previous version of the MS in which we used linear models rather that linear mixed models (both lead to the same results). The ANOVA used to test the significance of fixed terms in linear mixed model are based on Wald chi-sqare tests, and it should have been read “Chi-value” rather than “F-value” in both tables and the only degree of freedom in this test is the one at the numerator. This has been changed. We have changed the caption of the Table (“ANOVA table for the linear mixed model testing whether the relationships between biodiversity and ecosystem functions measured in a riverine trophic chain differ between the biodiversity facets (species or genetic diversity) and the types of BEF (within- or between-trophic levels)”)

      Minor:

      There should always be a space between a number and a unit. In the manuscript, spaces are inconsistently used between numbers and units.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      (1) In the introduction, the authors could focus more and build out what they predicted/hypothesized as well as what has been found in the manipulated experiments that examined the role of species and genetic diversity. That would enhance the background information for a more general audience, and highlight expected results and why.

      We modified the Introduction according to comments made by reviewer 1 and clarified the predictions as best as we can.

      (2) Similarly, the discussion is fairly big picture, but this dataset focused exclusively on this 3-trophic interaction in a riverine system. It could be beneficial to dig into the ecology to find out why the opposite effects of species and genetic diversity are seen within trophic levels in this system.

      We have added some explanations based on the specific pSEM (see our responses to the public reviews for details). But as said in the responses to the public reviews, even with mode detailed models, it is hard to tease apart mechanisms. One important point is that genetic and species diversity do not correlate one to each other (they do not co-vary over space), which means the effect of one facet is independent from the other. However, apart from that, we can’t really tell more without more mechanistic approaches. We understand this is frustrating, but this is the nature of field-based data. This does not mean they are useless. On the contrary, they confirm and expand patterns found under controlled conditions (which for ecologists is quite important as nature is our playground), but they are limited in inferences of mechanisms.

      (3) It would also be informative if the authors specified what positive and negative Z scores mean. It seems counterintuitive in Figure 3. For example, in the upper left, it's denoted as a larger intraspecific effect - which I'd assume is higher genetic (within species) diversity - but is this not where species diversity effects are higher? In theory this figure could be similar to Figure 1 from Des Roches et al. 2018 - where showing the 1:1 line of where species and genetic diversity effects are similar and then how some are more impacted by SD or GD as that links to the overall question, right?

      For example: Figure 3 makes it seem that GD effects are stronger (more positive) for within trophic responses (which is reflected in the text), but in that quadrant, it states that the interspecific effect is larger?

      yes, you’re true Figure 3 (now Figure 4) is not ideal. We added an explicit explanation for interpreting Zr in the main text. In addition, we modified the text in the quadrat as this was not correct. Note that it cannot be directly be compared to that of DesRoches et al. In DesRoches et al., there is a single effect size (ES) per situation (which is roughly expressed as “ES = effect of species - effect of genotypes”). Here, there are two ES per situation, one for the species effect, the other for the genetic effect, which makes the biplot more complex (as species and genetic can be similar in magnitude, but opposite in direction, e.g., 0.5 and -0.5). We may have done as DesRoches et al. (“ES = effect of species - effect of genotypes”), but as we don’t have absolute ES (as in DesRoches) the resulting signs of the ES are non sensical…Not easy for us to find a clever solution (or said differently, we were not clever enough to find an easy solution).  Nonetheless, we tried another visualization by including “sub-quadrats” into the four main quadrats. We hope this will be clearer

      (4) It's unclear why authors included both a simplified linear mixed model with diversity type and biodiversity facet as fixed factors, and then a second linear model that included trophic level (with those other 2 factors and interactions), but only showed results of trophic level from that more complex model. It is unclear why they include two models when the more complex one would have evaluated all aspects of their research question and shown the same patterns.

      You’re true, the more complex model evaluates both aspects. Nonetheless, as the hypotheses were strictly separated, we thought it is simpler to associate one model to one hypothesis. We agree that this duplicates information, but we would like to keep the two models to make the text more gradual.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Author response table 1.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

      Reviewing Editor (Recommendations for the Authors):

      After discussing the reviews, the reviewers and reviewing editor have agreed on a list of the most important suggested revisions for the authors, which, if satisfactorily addressed, would improve the assessment of the work. These suggested revisions are listed below. We also include the full Recommendations For The Authors from each of the individual reviewers.

      (1) The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. Additional mutagenesis and activity experiments to test the contribution of this site to transport would strengthen the manuscript. Measuring Na+ concentration-response relations and calculating Hill slopes in WT vs. an M site mutant would be a good experiment. Given the lack of functional data and poor density, it does not seem appropriate to build the M site sodium in the PDB model.

      The density is well defined to suggest a metal bound (waters would not be clearly defined at this resolution).  While our modeling of the site as a Na+ is arbitrary, this was done to satisfy the refinement programs where we have a known scatterer modeled.  We could model this density with other metals, but unlike crystallographic refinement, real-space refinement of cryoEM maps does not produce a difference map that might allow us to identify the metal but not conclusively.   The density of the maps is good (we have added better figures to demonstrate this).  We tried making multiple mutations to test for activity – unfortunately, we are still struggling to express proteins with mutations in this site in sufficient quantities to carry out transport assays.

      In the absence of being able to do the experiments, we did MD simulations (carried out by Senwei Quan and Jane Allison at University of Auckland).  Our results are shown below – we are not certain without further studies that these should be included in the current paper (we will add them as authors if the editor feels that this evidence is critical).

      Author response table 2.

      We are showing this for review to suggest that K+, Ca2+, and Na+ were tried, and only Na+ stays stably in the binding pocket. The rest of the results will also have to be explained, which would change the focus of the paper.

      We also provided the sequence to Alphafold3 and asked it to identify the possible metal binding sites—when the input was Na+, it found all three binding sites. 

      Summary:  Both our experimental data and computational studies suggest the observed metal binding site is real but at the moment, it is not possible to refine the structure and put an unidentified metal.  Computational studies suggest that this is a high-probability Na+ site. 

      Demonstration of cooperativity between the Na+ site and transport require carrying out these experiments with mutations in these sites in a concentration-dependent manner. Unfortunately, our inability to produce well-expressed and purified proteins with mutations in a short time frame failed. 

      (2) The authors identified the Neu5Ac binding site but only tested 2 residues for their involvement in substrate interactions, which was very limited. Given that the major highlight of this paper is the identification of the Neu5Ac binding site, it would strengthen the manuscript if the authors provided a more extensive series of mutagenesis experiments - testing at least the effect of D521A would be important. One inconsistency is Ser345 mutagenesis not affecting transport, and the authors should further discuss in the text why they think that is.

      D521A has been tested in H. influenzae, and this mutation results in loss of transport.  This residue is highly conserved and occupies the same position. We expect the result to remain the same. 

      We have added a few extra lines to discuss Serine 345: “Ser 345 OG is 3.5Å away from the C1-carboxylate oxygen – a distance that would result in a weak interaction between the two groups. It is, therefore, not surprising that the mutation into Ala did not affect transport. The space created by the mutation can be occupied by a water molecule.”

      (3) The purification and assessment of the stability of the protein are described in text alone with no accompanying data. It would be beneficial to include these data (e.g. in the Supplementary info) as it allows the reader to evaluate the protein quality.

      This is now added as Supplementary Figure 2.

      (4) The structural figures throughout the paper could benefit from more clarity to better support the conclusions. Specific critiques are listed below:

      - Figure 1: since the unbound map has a similar reported resolution, displaying the unbound structure's substrate binding site with the same contour would clearly demonstrate that the appearance of this density is substrate-dependent.

      - Figure 1: the atomic fit of the ligand to the density, and the suggested coordination by side chain and backbone residues, would be useful in this figure.

      - Figure 1: I think it would be more intuitive to compare apo and bound structures with the same local resolution scale.

      We have remade Figure 1 “Architecture of FnSiaQM with nanobody. (A and B) Cryo-EM maps of FnSiaQM unliganded and sialic acid bound at 3.2 and 3.17 Å, respectively. The TM domain of FnSiaQM is colored using the rainbow model (N-terminus in blue and C-terminus in red). The nanobody density is colored in purely in red. The density for modeled lipids is colored in tan and the unmodelled density in gray. The figures were made with Chimera at thresholds of 1.2 and 1.3 for the unliganded and sialic acid-bound maps. (C and D) The cytoplasmic view of apo and sialic acid bound FnSiaQM, respectively. Color coding is the same as in panels A and B. The density corresponding to sialic acid and sodium ions are in purple. The substrate binding sites of apo and sialic acid bound FnSiaQM are shown with key residues labeled. The density (blue mesh) around these atoms was made in Pymol with 2 and 1.5 s for the apo and the sialic acid, respectively, with a carve radius of 2 Å.”

      The local resolution maps have been moved to Supplementary Figure 3.

      - Figure 3, Figure 5a: The mesh structures throughout the manuscript are blocky and very difficult to look at and interpret, especially for the ion binding sites, which are currently suggestive of but not definitively ion densities. Either using transparent surfaces, higher triangle counts, or smoothing the surface might help this.

      We have made Figure 3 again with higher triangle counts.  We tried all three suggestions and this provided the best figure. We have replaced Figure 5A with density for Neu5Ac and residues around it.

      - Figure 5A: It would be important to show the densities of the entire binding pocket, especially coordinating side chains, to show the reader what is and isn't demonstrated by this structure.

      - It's not clear how Figure 5D is supposed to show that the cavity can accommodate Neu5Gc, as suggested by the text - please make the discussed cavity clearer in the Figure.

      We have now marked with an arrow the Methyl Carbon where the hydroxyl group is added.  We have mentioned that in the legend.  It is open to the periplasmic side of the cavity.

      - Supplementary Figure 4: Please label coordinating residue sites.

      Labels have been added to Supplementary Figure 6 which was earlier Supplementary Figure 4.

      (5) Intro section: the authors should introduce the work on HiSiaP around the role of the R147 residue in high-affinity Neu5Ac binding, which coordinates the carboxylate of Neu5Ac, and which is a generally conserved mechanism for organic acid binding in other TRAP transporters. This context will help magnify their discovery later that in the membrane domains, it is a key serine and not an arginine that coordinates the carboxylate group (probably as the local concentration of Neu5Ac is high and tight binding site is not desirable for rapid transport, which is mentioned in the discussion).

      Thank you for pointing this out. We have added a new sentence to the introduction.

      “All the SiaP structures show the presence of a conserved Arginine that binds to the C1-carboxylate of Neu5Ac, and this Arg residue is critical as the high electrostatic affinity may be important to have a strong binding affinity that sequesters the small amounts that reach the bacterial periplasmic space  (Glaenzer et al., 2017).”

      (6) TRAP transporters exist for many organic compounds and not just sialic acid, which might be nice to make the reader aware of.

      We initially did not do this as this is an advance paper and this was discussed in the earlier paper (Currie et. al., 2024). However, we have now added a sentence to the introduction. “Additionally, amino acids, C4-dicarboxylates, aromatic substrates and alpha-keto acids are also transported by TRAP transporters (Vetting et al., 2015). “

      (7) On p. 12, the authors describe the Neu5Ac binding site as a large solvent-exposed vestibule, having previously described the substrate-bound state as occluded. These descriptions should be adjusted to make clear which structure is being referenced. The clarity of this would be substantially improved if the authors included a figure that showed this occlusion - currently none of the structure figures clearly demonstrate what the authors are referring to. There are several conspicuous unmodeled densities proximal to the substrate, reminiscent of lipids (in between transport and scaffold domain) and possibly waters/ions. Given this, it is really surprising that the substrate binding site is described as "solvent-exposed" since the larger molecules seem to occlude the pocket. The authors should further process their dataset and discuss the implications of these surrounding densities.

      We have processed the data sets carefully both with cryosparc and relion and the resolution described here is same with both software with the cryosparc maps slightly better in terms of interpretability of peripheral helices and described in the manuscript. The current sample (FnTRAP) with the nanobody is a relatively stable sample (in our experience with other similar proteins) as evident from the number of images and particles to achieve a decent resolution and thus the workflow is straightforward and simple.  There are number of non-protein densities, which in principle can be modelled but we have chosen a conservative approach not to model these extra densities (except for the two lipids, few ions) due to limit of the resolution. It is possible that increasing the number of particles will result in an increase in resolution but from the estimated B-factor (125 or 135 Å2 for unliganded and liganded), this will certainly require lot of more images with no guarantee of increased resolution.

      The question of outward open Vs outward occluded is a valid point. We have now modified this in the manuscript. “The Neu5Ac binding site has a large solvent-exposed vestibule towards the cytoplasmic side, while its periplasmic side is sealed off. Cryo-EM map shows the presence of multiple densities that could be modeled as lipids, possibly preventing the substrate from leaving the transporter. However, the densities are not well defined to model them as specific lipids, hence they have not been modeled.  We describe this as the “inward-facing open state” with the substrate-bound.”

      (8) On p.15, the activity of FnSiaPQM in liposomes is reported, although the impetus for this study is not clear. Presumably, the reason for its inclusion is to ensure that the structurally characterized protein is active. It would be useful to say this at the start of the section if this is the case. This study nicely shows that the energetics and requirements of transport are identical to all the previous studies on Neu5Ac TRAP transporters - it would be good to acknowledge this somewhere in this section as well.

      These changes have been incorporated.  We have added a line to say why we did this and added as the last line that this is similar to other SiaPQM’s characterized.

      (9) Figure 5C. The authors show the transport activity with and without valinomycin. The authors do not explain the rationale for testing and reporting both conditions for these mutants; an explanation is required, or the data should be simplified. The expected membrane potential induced by valinomycin should be mentioned in the legend.

      We have simplified Figure 5C and added the expected membrane potential value.

      (10) The authors state that the S300A mutant is inactive. However, unless the authors also measured the background binding/transport of radiolabelled substrate in the absence of protein, then the accuracy of this statement is not clear because Figure 5C does indicate some activity for S300A, albeit much lower than WT. This is an important point in light of the authors' suggestion that the membrane protein does not need a binding site of high affinity or stringent selectivity.

      We thank the reviewer for pointing this out we have now added a line in the experimental protocols “The experimental values were corrected by subtracting the control, i.e. the radioactivity taken up in liposomes reconstituted in the absence of protein. The radioactivity associated with the control samples, i.e. empty liposomes was less than 10% with respect to proteoliposomes.”.

      (11) There are several issues and important omissions in the work cited:

      - It is not normal practice to cite a reference in the abstract and the citation is only to the second structure of HiSiaQM, which does not fairly reflect previous work in the field by only referring to their own work. Also throughout the article, it is normal practice with in-text citations to order them chronologically, i.e. earliest first. Please update this.

      This article was submitted as an “Research advance article”.  The instructions specifically say that “Research advance article should cite the article in eLife this paper advances.  Hence the citation of the “second structure of HiSiaQM”.  In fact, in the manuscript we explicitly say “The first structure of _Hi_SiaQM (4.7 Å resolution) demonstrated that it is composed of 15 transmembrane helices and two helical hairpins.”   We are following the policy laid out.  

      Zotero organizes multiple references in alphabetical order, we did not choose to do it that way – the suggestion of bias is not true. The final version of the accepted paper will have numbers, and this argument will automatically be corrected.

      - Intro: please cite the primary papers discovering other families of sialic acid transporters.

      - Intro: When introducing information on the binding site, dissociation constant of Neu5Ac, and thermodynamics of ligand binding to SiaP, the authors should also include references to the work done by others in addition to their own work.

      The Setty et al. paper was the first to demonstrate that the two-component systems are distinct, and that the binding protein of the TRAP system binds enthalpically while the binding protein of the ABC system binds entropically (SiaP vs SatA). As the reviewer points out, this is significant because it highlights how the Arg binding to the carboxylate, which is the enthalpic driver in this case and contributes to the difference between sugar binding to SiaP and SatA. Many studies have published binding affinities of molecules to SiaP, but this paper offers valuable insight into the differences between these systems. We have cited a number of the SiaP papers from other groups, including acknowledging the first structure of SiaP from H. influenzae by Muller et al., in 2006.

      - p.5 "TRAP transporters are postulated to employ an elevator-type mechanism...". This postulation has been experimentally tested and published, so should be discussed and referenced (Peter et al. 2024. https://doi.org/10.1038/s41467-023-44327-3).

      We have now corrected this error. We removed “are postulated to” and added the reference.

      - p.5 "Notably, the transport of Neu5Ac by TRAP transporters requires at least two sodium ions (Davies et al., 2023)." The requirement for at least 2 Na+ ions for Neu5Ac transport was first demonstrated in Mulligan et al. PNAS 2009, so should also be cited (for completion, so should Mulligan et al. JBC. 2012 and Currie et al. elife 2023, which have also shown this requirement is a commonality amongst all Neu5Ac TRAP transporters).

      Added.

      - P.12, Mulligan et al, JBC, 2012 should be added to the citations in the first sentence.

      Added.

      - p.19 "Interestingly, even the dicarboxylate transporter from V. cholerae (VcINDY) binds to its ligand via electrostatic interactions with both carboxylate groups". Other references are more appropriate than the one used to support this statement.

      Also added references for Mancusso et. al, 2012, Nie et.al, 2017 and Sauer et.al., 2022 here.

      - p.19. "The structure of the protein in the outward-facing conformation is unknown". The authors do not discuss the mechanistic findings from Peter et al 2024 Nat Comm here. The work described in that paper revealed an experimentally verified model of the OFS of HiSiaQM, so really needs to be included.

      This is not an experimentally determined 3D structure. They have shown the possible existence of this by microscopy, but the structure is not determined. The work mentioned is a wonderful piece of work, but it does not report the three-dimensional structure of the protein in the outward-facing conformation to allow us to understand the nature of the molecular interactions. 

      - The reference to Kinz-Thompson et al 2022 on p. 6 is not appropriate - neither the HiSiaQM papers nor the PpSiaQM paper makes reference to this work when identifying the binding site. More suitable references are used, for example, Mancusso et al 2012, Nie et al 2017 and Sauer et al 2022; this should be reported accurately.

      Added the suggested references.  We think the paper (Kinz-Thomposin et al 2022) is relevant and have also kept that reference.

      - Garaeva et al report the opposite of what the authors mention - "In the human neutral amino acid transporter (ASCT2), which also uses the elevator mechanism, the HP1 and HP2 loops have been proposed to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019)." In fact, this paper suggested a one-gate model of transport (HP2), where HP1 seems uninvolved in gating.

      The Reviewer is correct.  We were wrong and not clear.  The entire paragraph has been rewritten.

      “While, both the HP1 and HP2 loops have been hypothesized to be involved in gating, in the human neutral amino acid transporter (ASCT2), (which also uses the elevator mechanism), only the HP2 loops have been shown to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019). Hence, it is suggested that there is a single gate that controls substrate binding. Superposition of the _Pp_SiaQM and _Hi_SiaQM structures do not reveal any change in these loop structures upon substrate binding. For TRAP transporters, the substrate is delivered to the QM protein by the P protein; hence, these loop changes may not play a role in ligand binding or release. This may support the idea that there is minimal substrate specificity within SiaQM and that it will transport the cargo delivered by SiaP, which is more selective.”

      - p.19 "suggesting that SSS transporters have probably evolved to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al, 2018)." Surely this goes without saying since Wahlgren et al 2018 demonstrated that SiaT, an SSS, could transport sialic acid? It's unclear why this was included here - perhaps it needs to be rewritten to make the point more clearly, but as it stands, this statement appears self-evident. Furthermore, these proteins can transport all kinds of molecules (see TCDB 2.A.21). This statement needs to be clarified. 

      This was a comparison to other Neu5Ac binding sites in other Neu5Ac transporters. We have modified the sentence. “The polar groups bind to both the C1-caboxylate side of the molecule and the C8-C9 carbonyls, suggesting that Proteus mirabilis Neu5Ac transporter (SSS type) evolved specifically to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al., 2018)”.  These were arguments we were making to suggest that the lack of tight binding could also mean reduced specificity.

      - The authors reconstitute the FnSiaQM and measure transport with SiaP, which resembles closely what is known for both HiSiaPQM, VcSiaPQM, which is not cited (https://doi.org/10.1074/jbc.M111.281030).

      - Regarding lipids between transport and scaffold domains: there is precedent for such lipids in the elevator transporter GltPh, Wang, and Boudker (eLife 2020) proposed similar displacements during transport and would be appropriate to cite here.

      We have now cited the reference to the Mulligan et al., 2012 paper.  We also added a sentence on the findings of GltPh paper by Wang and Boudker.  Thank you for pointing this out.

      (12) p.9 "TRAP transporters, as their name suggests, comprise three units: a substrate-binding protein (SiaP) and two membrane-embedded transporter units (SiaQ and SiaM) (Severi et al., 2007)." This is somewhat odd phrasing because the existence of fused membrane components has been well-documented for a long time. The addition of "Many" at the start of the sentence fixes this.

      Added Many.

      (13) On p.12 the authors compare the ligand-induced conformational changes of FnSiaQM with ASCT2, citing Garaeva et al, 2019. This comparison does not make sense considering TRAP transporters and ASCT2 do not share a common fold. A far superior comparison is with DASS transporters, which actually do have the same fold as TRAP transporters. And, importantly, the Na+ and substrate-induced conformational changes have been investigated for DASS transporters revealing a unique mechanism likely shared by TRAP transporters (Sauer et al, Nat Comm, 2022). The text on p.12 should be adjusted to replace the ASCT comparison with a VcINDY comparison.

      The purpose of citing the ASCT2 paper was only concerning the HP1 and HP2 gates.  The authors show that HP2 changes conformation only.  Comparing the two FnSiaQM structures – with and without ligand, we see no change in either the HP1 or the HP2 loops.  On Page 17, when we describe the structure, we do specifically mention that the overall architecture is similar to VcINDY and the DASS transporters.

      (14) p.12 "For TRAP transporters, the substrate is delivered to the QM protein by the SiaP" protein;" "SiaP protein" should be "P protein"

      Corrected.

      (15) p.18. "periplasmic membrane" should be "cytoplasmic membrane".

      Corrected.

      (16) p.19. "This prevents Neu5Ac from binding..." There is no evidence for this so this needs to be softened, e.g. "This likely prevents Neu5Ac from...".

      Agree – Modified.

      (17) Figure 2B is rather small, cramped, and difficult to see. We suggest that the authors make that panel larger, or include it as a stand-alone supplementary figure.

      We have moved this figure into a supplementary figure as suggested by the reviewer.

      (18) The authors describe the Neu5Ac binding site in SiaQM. It would be helpful if the authors provided a figure in support of the statement that the Neu5Ac binding site architecture is similar to dicarboxylate in VcINDY (especially as Neu5Ac is a monocarboxylate).

      The Neu5Ac binding site is NOT similar to the VcINDY binding site. But, we understand the origin of the comment. We have now changed the sentence: “The overall architecture of the Neu5Ac binding site is similar to that of citrate/malate/fumarate in the di/tricarboxylate transporter of V. cholerae (Vc_INDY), but the residues involved in providing specificity are different (Kinz-Thompson _et al., 2022; Mancusso et al., 2012; Nie et al., 2017; Sauer et al., 2022). Neu5Ac binds to the transport domain without direct interactions with the residues in the scaffold domain. The majority of the interactions are with residues in the HP1 and HP2 loops of the transport domain (Figure 5B). Asp521 (HP2), Ser300 (HP1), and Ser345 (helix 5) interact with the substrate through their side chains, except for one interaction between the main chain amino group of residue 301 and the C1-carboxylate oxygen of Neu5Ac. Mutation of the residue equivalent to Asp521 has been shown to result in loss of transport (Peter et al., 2022). To evaluate the role of residues Ser-300 and Ser-345, we mutated them to alanine and performed the transport assays.”  

      (19) When comparing the binding modes of Neu5Ac to different proteins in Figure 6, it would be helpful to include the structure in this paper as well.

      The Neu5Ac binding site is present in figure 5. We would prefer not to show it again in Figure 6.

      Additionally, there is a clear binding mode of Neu5Ac in Figure 1 as well.

      (20) The manuscript would benefit from a more detailed comparison between Na+-bound (described as apo) and Na+/Neu5Ac structures, especially the prospective gates. If this transporter behaves anything like the archetypical ion-coupled glutamate transporters, some structural changes in the gates might be expected to facilitate transport domain movement when the substrate is loaded, but not when only Na+ is bound. It would be important to discuss and visualize these changes.

      We have described in the manuscript that there is NO change in the HP1 and HP2 gates between the unliganded structure and the Neu5Ac bound structure. The major difference we observe is the ordering of the third metal binding site.

      A figure comparing the substrate binding pockets between the different high-resolution structures would also be informative. Do the bonding distances between ligands and side chains significantly change between homologs?

      This is the only Neu5Ac bound structure.  Since the specificity to the substrate comes from the variability of the residues that interact it, we do not believe that this figure would not add much value.  

      (21) A supplementary figure (or an inset to Figure 2) showing pairwise percent identity between different characterized QM transporters would be useful.

      We have now added a Supplementary Figure 4 showing the comparison of the three QM sequences whose structures have been determined.

      (22) There is relatively minimal EM processing. More rigorous processing would require relatively little effort and could boost resolution, making this a vastly improved manuscript with a much more confident interpretation of structures.

      We described the overall workflow. The processing was rigorous. After obtaining the first maps, we created templates with the structure and did template-based picking.  We then did several rounds of 2D classification followed by homogenous refinement, Non-Uniform Refinement.  We then made masks and carried out local refinement.  We then got the best maps and did a 3D classification. Refined the 3D classes independently.  Then, we regrouped them based on how similar they were. We then went back and picked particles again (we used different methods of particle picking, but template-based picking resulted in the final set of particles used) and went through the whole process again.  At the end of the refinement, we carried out global and local CTF refinement followed by reference-based motion correction. The final refinement was then done with the Bayesian polished particles.  The final refinement was local refinement with a mask over only the transporter and the nano-body. After the reviews came, we tried multi-body refinement in Relion5.  It did not improve resolution. We have expanded the legend to supplementary Figure 2 (without listing all the different things we tried). The best resolution we obtained for the structure was 3.1 Å. However, it is important to note that the local resolution of the map around the ligand is good. 

      We realized this is not easy to depict in a local resolution map.  So, we wrote a script to take every atom, then take a radius of 5 Å (again we tried different radii and used the optimal one; we are preparing a manuscript to describe this), take all the local resolution values within the 5 Å spere and average it and add it as B-factor that atom. We have moved the local resolution map figure to the supplement and replaced Figure 1 with a Cartoon, where the color represents the local resolution in which the atom is. 

      (23) Calling the structure without Neu5Ac bound an "apo" structure is confusing since it indeed has the ligand Na+ present and bound. "Na+" and "Na+/Neu5Ac" structures would be more appropriate.

      Changed all “apo” to “unliganded”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      - The manuscript needs comprehensive proofreading for language and formatting. In many instances, spaces are missing or not required.

      Thank you for your comments. The manuscript has been thoroughly proofread for errors in language and formatting.

      - Could the authors explore correlation network analyses to get additional insights into the structure of different clusters? 

      We have added a co-occurrence analysis (at species taxonomic level) based on SparCC to the manuscript (Figure 2).

      This is described on Page 9 line 141-148

      - The GitHub link is not correct. 

      The github repository has now been made public.

      - It is not possible to access the dataset on ENA. 

      We have changed the ENA study PRJEB57401 status to open.

      - Add the graphs obtained with decontam analysis as a supplementary figure. 

      We have added the outputs of decontam (.csv files with feature lists of ASVs that were filtered based on the prevalence and frequency tests) to the github repository.

      - There is nothing about the RPL group in the results section, while the authors discuss this issue in the introduction. What about the controls with proven fertility? 

      Thank you. We have amended the manuscript to compare characteristics between the RPL, unexplained subfertility and controls groups.

      Line 1279-130 page 8:  

      “The study group represented 85% of samples with high sperm DNA fragmentation, 85% of samples with elevated ROS and 79% of samples with oligospermia. Rates of abnormal seminal parameters including low sperm concentration, reduced progressive motility and ROS concentrations were found to be highest in the MFI group (Supplementary Figure 1). Baseline characteristics between the RPL, unexplained subfertility and controls groups were similar.

      Line 150-154 Page 9: 

      “Bacterial richness, diversity and load were similar between all patient groups examined in the study (Supplementary Figure 4).

      - While correctly stated in the title, the term microbiota should be used throughout the manuscript instead of "microbiome" 

      Thank you. This misnomer has been amended throughout the manuscript.

      Minor corrections:

      Line 25: provoke is not a good term here. 

      Thank you. The term ‘provoke’ has been removed

      Line 26: why does semen culture have a limited scope? 

      Thank you. Line 40-41 Page 3 has been amended:

      “It is therefore plausible that asymptomatic seminal infections may be associated with impaired reproductive function in some men. Since semen culture has a limited scope for studying the seminal microbiota due to its inability to identify all present microbiota next generation sequencing (NGS) approaches have been reported recently by a growing number of investigators (13, 14, 15, 16, 17, 18, 19)”.

      Line 68: write μl correctly

      Thank you. This has been corrected

      Line 131: several organisms at the genus level. 

      Thank you. This has been corrected

      Line 136: what are the relative abundances of these genera? Is this relevant? 

      The mean relative abundances for the key taxa mention in each cluster are all above 20%. This information has been added to the manuscript text on page 9, line 153.

      Line 173: Molina et al. 

      Thank you. This has been corrected

      Line 173: the contaminations are referred to the low biomass nature of testicular samples. If present, bacteria of accessory gland secretions are an integral part of the seminal microbiota itself. Please review these sentences. 

      Thank you. This had been reworked to highlight the important of urethral contamination, which you later allude to as a limitation of our study is the failure to provide paired urine and semen samples.

      Page 11 line 194-196

      “Molina et al report that 50%-70% of detected bacterial reads may be environmental contaminants in a sample from extracted testicular spermatozoa (35); with the addition of passage along the urethra it is likely that contamination of ejaculated semen would be much higher.”

      Table 1: remove results interpretation from table caption. 

      Thank you this has been acted upon.

      Table 1: why in some cases, like in DNA fragmentation index, the total is not equal to n=223? 

      This is due to missing data/ analysis not possible for some men due to the requirement of a minimum number of sperm in the ejaculate to perform DNA fragmentation testing.

      Table 1: "frag" is not defined. 

      Thank you, this has been amended

      Tables 2, 3 & 4: bacterial genera in italics. 

      Thank you, this has been amended

      Figure 1A: add the fertility status information above the cluster colors. 

      Thank you, this has been amended in Figure 1.

      Figure 1C: the color code is confusing. Use different colors for each cluster. 

      Figure 1 legend: bacterial genera in italics. 

      Figures 1 & 2: the authors should use similar chart formatting in the two tables. 

      Thank you, this has been amended

      Reviewer 2:

      (1) The patient groups have different diagnoses and should be handled as different groups, and not fused into one 'patient' group in analyses. <br /> Why are the data in tables presented as controls and cases? I would consider men from couples with recurrent pregnancy loss, unexplained infertility, and male factor infertility to have different seminal parameters (not to fuse them into one group). This means, that the statistical analyses should be performed considering each group separately, and not to fuse 3 different infertility diagnoses into one patient group. 

      We have conducted detailed analyses, requested by the reviewer, comparing seminal DNA, ROS and microbiota characteristics between each individual patient groups (Supplimental figures 1 and 4). No specific taxa (at either genera or species-level) were found to differ in relative abundance between the diagnostic groups. However, we expect associations between parameters such as reactive oxygen species, or DNA fragmentation, and relative abundance of bacterial species, to be general and not restricted to or specific to each diagnostic group. Therefore, we also conducted further analyses aggregating data from all patient groups to investigate relationships common to these different forms of male reproductive dysfunction.

      (2) Were any covariables included in the statistical analyses, e.g. age, BMI, smoking, time of sexual abstinence, etc? 

      Covariates were not included in the statistical analyses. This has been added in the manuscript to the limitations.

      Page 14 line 267-268

      “Additionally, we did not have other covariables such as smoking status with which to include in further analyses”.

      (3) Furthermore, it is known that 16S rRNA gene analysis does not provide sensitive enough detection of bacteria on the species level. How much do the authors trust their results on the species level? 

      The limitations of taxonomic assignment using 16S rRNA gene metataxonomics are well documented. However, the capacity to assign sequence amplicons at species level depends on the sequence variability of the 16S rRNA gene for each of the taxa reported and the specific gene region chosen. In this study, amplification of the V1-V2 region was performed using a mixed 28f primer set (see methods for details) that enables resolution and assignment of several bacterial species highly relevant to the reproductive tract including Lactobacillus spp., such as L. crispatus and L. iners, (e.g. https://doi.org/10.3389/fcell.2021.641921, https://doi.org/10.1128/msystems.01039-23, https://doi.org/10.1186/s12915-023-01702-2). In this study, we report the presence of L. iners, but not L. crispatus in semen samples, and we have also identified a specific association/co-occurrence between Gardnerella vaginalis and Lactobacillus iners, similar to that observed in vaginal bacterial communities.

      (4) Were the analyses of bacterial genera and species abundances with seminal quality parameters controlled for diagnosis and other confounders? 

      As stated in point 2, no adjustment was made for co-variates. No differences in microbiome composition were observed among the three diagnostic groups, so no adjustments were made to our analysis.

      (5) The authors stress that their study is the biggest on the microbiome in semen. However, when considering that the study consists of 4 groups (with n=46-63), it does not stand out from previous studies. 

      Our study is overall the largest investigating interactions between the seminal microbiome and male reproductive dysfunction. Other studies have included greater numbers of men with infertility.

      (6) Weaknesses: There is a lack of paired seminal/urinal samples. 

      Thank you. This limitation has been added.

      Page 14 line 266-267

      “A further limitation of this study, and others, is the lack of reciprocal genital tract microbiota testing of the female partners, or paired seminal and urinary samples from male participants”.

      Recommendation for authors to consider:

      Including previous classical reviews in the introduction: DOI:10.1097/MOU.0000000000000742 <br /> DOI: 10.1038/s41585-019-0250-y 

      Thank you. This has been added.

      Mentioning in the M&M section that there is a supplementary text with a more detailed M&M part. 

      Thank you. This has been added. Further methodological detail can be found in supplementary text.

      Revising the use of 'microbiota' and 'microbiome', they are not synonyms. When talking of 16S rRNA gene analysis, we consider 'microbiome' analysis. 

      Thank you. This misnomer has been amended throughout the manuscript.

      Revising the text, there are several erratas (e.g. verb missing, etc). 

      Thank you for your comments. The manuscript has been thoroughly proofread for errors in language and formatting.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      In the manuscript entitled "Magnesium modulates phospholipid metabolism to promote bacterial phenotypic resistance to antibiotics", Li et al demonstrated the role of magnesium in promoting phenotypic resistance in V. alginolyticus. Using standard microbiological and metabolomic techniques, the authors have shown the significance of fatty acid biosynthesis pathway behind the resistance mechanism. This study is significant as it sheds light on the role of an exogenous factor in altering membrane composition, polarization, and fluidity which ultimately leads to antimicrobial resistance. 

      Strengths: 

      (1) The experiments were carried out methodically and logically. 

      (2) An adequate number of replicates were used for the experiments. 

      Weaknesses: 

      (1) The introduction section needs to be more informative and to the point.  

      Thank you so much for your suggestion. We have revised the introduction to make it more informative and to the point as following:

      “Non-inheritable antibiotic or phenotypic resistance represents a serious challenge for treating bacterial infections. Phenotypic resistance does not involve genetic mutations Phenotypic resistance does not involve genetic mutations and is transient, allowing bacteria to resume normal growth. Biofilm and bacterial persisters are two phenotypic resistance types that have been extensively studied (Brandis et al., 2023; Corona & Martinez, 2013). Biofilms have complex structures, containing elements that impede antibiotic diffusion, sequestering and inhibiting their activity (Ciofu et al., 2022). Biofilm-forming bacteria and persisters also have distinct metabolic states that significantly reduce their antibiotic susceptibility (Yan & Bassler, 2019). These two types of phenotypic resistance share the common feature in their retarded or even cease of growth in the presence of antibiotics (Corona & Martinez, 2013). However, specific factors that promote phenotypic resistance and allow bacteria to proliferate in the presence of antibiotics remain poorly defined.

      Metal ions have a diverse impact on the chemical, physical, and physiological processes of antibiotic resistance  (Booth et al, 2011; Lu et al, 2020; Poole, 2017). This includes genetic elements that confer resistance to metals and antibiotics (Poole, 2017) and metal cations that directly hinder (or enhance) the activity of specific antibiotic drugs (Zhang et al., 2014). The metabolic environment can also impact the sensitivity of bacteria to antibiotics (Jiang et al., 2023; Lee & Collins, 2012; Peng et al., 2015; Zhang et al., 2020; Zhao et al., 2021). Light metal ions, such as magnesium, sodium, and potassium, can behave as cofactors for different enzymes (Du et al., 2016) and influence drug efficacy. Heavy metal ions, including Cu2+ and Zn2+, confer resistance to antibiotics (Yazdankhah et al., 2014; Zhang et al., 2018). Recent reports suggest that sodium negatively regulates redox states to promote the antibiotic resistance of Vibrio alginolyticus (Yang et al., 2018), while actively growing Bacillus subtilis cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al, 2019). In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations.”

      (2) The weakest point of this paper is in the logistics through the results section. The way authors represented the figures and interpreted them in the results section (or the figure legends) does not match. The figures are difficult to interpret and are not at all self-explanatory. 

      Thank you so much for your suggestion. We have followed your suggestion to check the match between result and figures. They are now revised. 

      (3) There are too many mislabeling of the figure panels in the main text which makes it difficult to find out which figures the authors are explaining. There should be more explanation on why and how they did the experiments and how the results were interpreted. 

      Thank you so much for your suggestion. We have checked the figures and main text to ensure that we make every figure clearly stated.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors aimed to identify if and how magnesium affects the ability of two particular bacteria species to resist the action of antibiotics. In my view, the authors succeeded in their goals and presented a compelling study that will have important implications for the antibiotic resistance research community. Since metals like magnesium are present in all lab media compositions and are present in the host, the data presented in this study certainly will inspire additional research by the community. These could include research into whether other types of metals also induce multi-drug resistance, whether this phenomenon can be observed in other bacterial species, especially pathogenic species that cause clinical disease, and whether the underlying molecular determinants (i.e. enzymes) of metal-induced phenotypic resistance could be new antimicrobial drug targets themselves. 

      Strengths: 

      This study's strengths include that the authors used a variety of methodologies, all of which point to a clear effect of exogenous Mg2+ on drug resistance in the targeted species. I also commend the authors for carrying out a comprehensive study, spanning evaluation of whole cell phenotypes, metabolic pathways, genetic manipulation, to enzyme activity level evaluation. The fact that the authors uncovered a molecular mechanism underlying Mg2+-induced phenotypic resistance is particularly important as the key proteins should be studied further.

      Weaknesses: 

      I believe there are weaknesses in the manuscript, however. The authors take for granted that the reader is familiar with all the assays utilized, and do not properly explain some experiments, and thus I highly suggest that the authors add a brief statement in each situation describing the rationale for each selected methodology (more details are in the private review to the authors). The Results section is also quite long and bogs down at times, and I suggest that the authors reduce its length by 10 to 20%. In contrast, the Introduction is sparse and lacks key aspects, for example, there should be mention of the study's main purpose and approaches, plus an introduction to the authors' choice of species and their known drug resistance properties, as well as the drug of choice (balofloxacin). Another notable weakness is that the authors evaluated Mg2+-induced phenotypic resistance only against two closely related species, and thus the generalizability of this mechanism of drug resistance is not known. The paper would be strengthened if the authors could demonstrate this type of phenotypic resistance in at least one more Gram-negative species and at least one Gram-positive species (antimicrobial susceptibility evaluations would suffice), each of which should be pathogenic to humans. Demonstrating magnesium-induced phenotypic drug resistance in the WHO Priority Bacterial Pathogens would be particularly important. 

      In general, the conclusions drawn by the authors are justified by the data, except for the interpretation of some experiments. Importantly, this paper has discovered new antimicrobial resistance mechanisms and has also pointed to potential new targets for antimicrobials. 

      Thank you so much for your suggestion! We followed your idea the revise the manuscript as following:

      (1) We added a brief statement in the situation to explain the result and methodology according to your suggestion in the private review.

      (2) To make the streamline of the story more logic, we moved the whole second result to supplementary text and supplementary figure. 

      (3) We revised the introduction part by adding additional information to make it informative and to the point as following:

      “Non-inheritable antibiotic or phenotypic resistance represents a serious challenge for treating bacterial infections. Phenotypic resistance does not involve genetic mutations Phenotypic resistance does not involve genetic mutations and is transient, allowing bacteria to resume normal growth. Biofilm and bacterial persisters are two phenotypic resistance types that have been extensively studied (Brandis et al., 2023; Corona & Martinez, 2013). Biofilms have complex structures, containing elements that impede antibiotic diffusion, sequestering and inhibiting their activity (Ciofu et al., 2022). Biofilm-forming bacteria and persisters also have distinct metabolic states that significantly reduce their antibiotic susceptibility (Yan & Bassler, 2019). These two types of phenotypic resistance share the common feature in their retarded or even cease of growth in the presence of antibiotics (Corona & Martinez, 2013). However, specific factors that promote phenotypic resistance and allow bacteria to proliferate in the presence of antibiotics remain poorly defined.

      Metal ions have a diverse impact on the chemical, physical, and physiological processes of antibiotic resistance  (Booth et al, 2011; Lu et al, 2020; Poole, 2017). This includes genetic elements that confer resistance to metals and antibiotics (Poole, 2017) and metal cations that directly hinder (or enhance) the activity of specific antibiotic drugs (Zhang et al., 2014). The metabolic environment can also impact the sensitivity of bacteria to antibiotics (Jiang et al., 2023; Lee & Collins, 2012; Peng et al., 2015; Zhang et al., 2020; Zhao et al., 2021). Light metal ions, such as magnesium, sodium, and potassium, can behave as cofactors for different enzymes (Du et al., 2016) and influence drug efficacy. Heavy metal ions, including Cu2+ and Zn2+, confer resistance to antibiotics (Yazdankhah et al., 2014; Zhang et al., 2018). Recent reports suggest that sodium negatively regulates redox states to promote the antibiotic resistance of Vibrio alginolyticus (Yang et al., 2018), while actively growing Bacillus subtilis cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al, 2019). In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations.”

      (4) We examined the effect of magnesium in WHO listed priority strains, which confirmed the results as following:

      “Importantly, exogenous MgCl2 also increased MICs of clinic isolates, carbapenemresistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenemresistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii to balofloxacin (Fig 1G).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) There are many grammatical mistakes to point out. The manuscript needs proofreading and editing.

      We appreciate this comment! The manuscript has been revised by a native speaker.

      (2) The introduction could be more informative. A little more description of magnesium - such as what it does to antibiotics and how it's known to affect the microbiome - might be helpful for the general readers. The question remains why out of all the metal ions that might affect antibiotic resistance (many of them are less explored), authors particularly decided to work on the effect of magnesium. The introduction should cover the rationale of their hypothesis. Also, the authors might want to briefly talk about the model organisms (V. algonolyticus and V. parahemolyticus) describing how threatening they are and how they are becoming resistant to antibiotics. 

      We appreciate this comment! We revise the introduction by providing additional information as following:

      “In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations. ”

      (3) Figure 1C is mislabeled as 1B (line 100). Line 101: The sentence is not clear and very confusing. What is meant by 15.6mM - 62.4 mM? Are they talking about the concentration of BLFX (though in the figure the concentration was shown in µg)? Please rewrite the sentence in a simplified way. Also, the zone of inhibition was decreased with increasing MgCl2, not increased. 

      We appreciate this comment! These have been revised, including that Fig 1B is now corrected as Fig. 1C. Line 101, which is now Line 122. The sentence was revised as following:

      “At balofloxacin doses of 1.56, 3.125, 6.25, and 12.5 µg, the zone of inhibition decreased with increasing MgCl2 (Fig 1D)”

      (4) In the western blot images, it would be nice to indicate the MW of the protein bands shown. The loading control used for the experiments should be clearly mentioned in the figure legends. 

      We appreciate this comment! The MWs are indicated in the western-blot image throughout the manuscript. 

      The loading control is clearly stated in the figure legend as following:

      “Whole cell lysates resolved by SDS-PAGE gel was stained with Coomassie brilliant blue as loading control.”. 

      (5) Figures 2 B and C: the figure legend does not explain what the authors wanted to show. It's not clear how they plotted the inhibitory curve, or the binding efficacy. These panels need an explanation of how the analysis was done.

      We appreciate this comment! The figure 2 is now removed to Suppl. Fig 2, and the description of figure 2 is moved to Suppl. Text. We revise the description of the result as following, which is in Suppl. Text:

      “Prior studies suggest that the chelation of antibiotics by magnesium ions inhibits antibiotic uptake (Deitchman et al., 2018; Lunestad and Goksøyr, 1990). To investigate whether magnesium binds to balofloxacin, balofloxacin was pre-incubated with magnesium, and zone of inhibition (ZOI) analysis was conducted. Six different concentrations of balofloxacin were separately incubated with six different concentrations of MgCl2, and then spotted on filter paper so that a defined amount of balofloxacin could be used for ZOI. While lower concentrations of MgCl2, (0.78, 3.125, or 12.5 mM) did not alter the ZOI, higher concentrations, including 50 and 200 mM MgCl2, decreased the ZOI (Suppl. Fig 2A), suggesting that even high doses of magnesium had only a partial effect on balofloxacin through direct binding. For example, at 200 mM MgCl2 and 5 or 10 μg/mL balofloxacin, the balofloxacin ZOI was 53.2 and 70.3% of the ZOI at 0 mM MgCl2, suggesting that  50% of the antibiotics were still functional. Intracellular BLFX also decreased with increasing MgCl2 (Suppl. Fig 2B), while exogenous Mg2+ increased intracellular Mg2+ levels in a dose-dependent manner. For example, exogenous 50 and 200 mM MgCl2 increased intracellular Mg2+ levels to 1.21 and 1.31 mM, respectively (Suppl. Fig 2C). The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      (6) For the metabolomics results, it will help immensely if the authors provide a volcano plot of the identified metabolites and plot the heat map according to the -log2 metabolite intensities. In Figure 3A, it's not clear what information is conveyed through Euclidean distance calculations of the heat map. In Figure 3 B, the authors mentioned that the OPLS-DA test was conducted, although the figure shows a PCA plot, so it's not clear how these two are connected. Figure 3 E: the figure legend says scattered plot, but the panel represents color-coded numerical values, not a scattered plot. Also, it's not clear how they got those values. 

      We appreciate this comment! We quite agree with you that if the differential metabolites could be shown as volcano plot. However, we didn’t adopt volcano plot in this study because this is a magnesium concentration-dependent metabolomes that includes 6 groups in parallel. Volcano plots may give a complex view of the comparison among different groups. We also tried to plot the heat map according to the -log2 metabolite intensities. Although this analysis cluster 200 mM and 50 mM groups better, the data of low magnesium concentrations was not consistent, which may be due to the minor metabolic change of low concentrations magnesium. Thank you for your understanding. 

      For Euclidean distance calculations, we explain in the figure legend as following:

      “Euclidean distance calculations were used to generate a heatmap that shows clustering of the biological and technical replicates of each treatment.” 

      In Figure 2B, which was Figure 3B in previous version, it has been replaced with OPLS-DA analysis in the revised version. 

      In Figure 2E, which was Figure 3E in previous version, it is revised as following:

      “E. Areas of the peaks of palmitic acid and stearic acid generated by GC-MS analysis.” 

      (7) In Figure 4, the figure legends (as well as the in the text) are not properly referred to. Please make sure to refer to the correct panel. 

      We appreciate this comment! The figure legends have been corrected to match the panel and text. 

      Figure 4F: how was the synergy analysis done? In the methods section, the authors described the antibiotic bactericidal assay protocol, but there was no clear indication of how they generated the isobologram. 

      We appreciate this comment! We provide additional information in the Figure 3F legend, which was Figure 4F in previous version,  as following: 

      “Synergy analysis for BFLX with palmitic acid for V. alginolyticus. Synergy was performed by comparing the dose needed for 50% inhibition of the synergistic agents (white) and non-synergistic (i.e., additive) agents (purple).”

      (8) Figure 5 A: the scatter plot is plotted according to the area along the Y axis: which "area" is represented here? There is absolutely no explanation, neither in the results nor in the figure legends. Using box plots might be a better option than using a scattered plot.

      We appreciate this comment! “Area” has been noted in the revised manuscript as following:

      “The area indicates the area of the peak of the metabolite in total ion chromatography of GC-MS.” 

      (9) In Figure 6 A, the heat map is plotted according to the column Z scores. What is meant by "column Z score"? The corresponding figure legend says, "heat map showing differential abundance of lipid". Z scores do not represent an abundance of a variable, so the conclusion might not be appropriate here. 

      We appreciate this comment! In Figure 5A, which was Figure 6A in previous version, column Z score shows the abundance of metabolites analyzed, which is automatically generated in the heat map analysis to give a sign of these metabolites tested. The legend has been revised as following: 

      “Heatmap showing changes in differential lipid levels at the indicated concentration of MgCl2.”  

      (10) Line 313-314: it should be Figure EV6C.  

      We appreciate this comment! The citation has been corrected.

      (11) The authors have shown that Mg+2 does not alter the LPS transport system, however, there was some significant increase in LPS expression at 200mM MgCl2. It would be interesting if the authors could also check if Mg+2 has any effect on the outer membrane protein (OMP) integrity (by checking OMP components BamA and LptD).  

      We appreciate this comment!  We have carefully examined the membrane permeability in Figure 7. We thus didn’t perform additional experiment here to see the change of BamA and LptD. Thank you very much for your understanding.

      (12) I wonder if the authors could check the effect of extracellular Mg+2 during the co-treatment of palmitic acid, linoleic acid, and balofloxacin. Will there still be the antagonistic effect or the presence of Mg+2 could change the phenotype? 

      We appreciate this comment! Additional experiments is performed as following:

      “Furthermore, magnesium had a minimal effect on the antagonistic effect of palmitic acid, linolenic acid, and balofloxacin (Fig 4G), suggesting that this mineral functions through lipid metabolism.” 

      Reviewer #2 (Recommendations For The Authors)

      (1) As mentioned in the Public Review, I strongly believe that the impact of this study will be more significant if magnesium-induced phenotypic drug resistance could be demonstrated in at least one other Gram-negative and one other Grampositive species, both of which should be human pathogens. The full suite of experiments would not be necessary for this suggestion; evaluation of the effect of Mg concentration in growth media on the drug resistance of other species, testing the different antibiotic types used in this study, would be sufficient. 

      We appreciate this comment! Additional experiments have performed to test this idea. Mg2+ has the similar effect on carbapenem-resistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenem-resistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii as the similar as on the Vibrio species in shown in Figure 1G. These have been described following as

      “Importantly, exogenous MgCl2 also increased MICs of clinic isolates, carbapenemresistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenemresistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii to balofloxacin (Fig 1G).”

      (2) I recommend that the Introduction section be expanded. I recommend one or two sentences introducing the two Vibrio species selected for study. I.e. why did the authors choose these two species? What is known about their phenotypic drug resistance in the literature? Why did the authors select balofloxacin for their studies, is it a common antimicrobial used vs Vibrios? As well, the end of the Introduction section ends abruptly with no transition to the present study itself. The end of the introduction should include one or two sentences introducing the main purpose of the study, its approach, and the techniques undertaken. For example, "In this study, we evaluated whether magnesium induces phenotypic resistance in Vibrio species and the molecular/genetic basis for such resistance. We used genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility evaluations." 

      We appreciate this comment! We revise the introduction by providing additional information as following:

      “In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations. ”

      (3) The authors introduce the acronym AWST but never use it again in the paper, instead they use SWT. The authors should introduce SWT only for consistency. 

      We appreciate this comment! We have corrected all the “SWT” to “ASWT”

      (4) Line 76 is not clear: what is meant by "some of which could influence drug efficacy" - the enzymes that utilize light metal ions are co-factors? Or the metals directly?  

      We appreciate this comment! The information we wanted to deliver is that light metal ions can serve as cofactors to catalyze biochemical reaction. Such chemical reaction would alter the drug efficacy, e.g. the Fe-S cluster are metallocofactor for proteins which regulates redox chemistry including antibioticinduced redox change. However, this information is not appropriate for this manuscript, so we delete this sentence. 

      (5) Line 90: add a reference corroborating that this chemical composition is a mimic of marine water. The NaCl concentration used in particular looks quite low. 

      We appreciate this comment! It was a typo error. The NaCl concentration was 210 mM as shown in Suppl. Table 1. We also provide details of the chemical composition of the marine water as following:

      “Marine environments and agriculture, where antibiotics are commonly used, are rich in magnesium. To investigate whether this mineral impacts antibiotic activity, the minimal inhibitory concentration (MIC) of V. alginolyticus ATCC33787 and V. parahaemolyticus VP01, which we referred as ATCC33787 and VP01 afterwards, isolated from marine aquaculture, to balofloxacin (BLFX) in Luria-Bertani medium

      (LB medium) plus 3% NaCl as LBS medium and “artificial seawater” (ASWT) medium that included the major ion species in marine water (Wilson, 1975) (LB medium plus 210 mM NaCl, 35 mM Mg2SO4, 7 mM KCl, and 7 mM CaCl2) were assessed”

      (6) Line 98 and Figure 1B. M9 is indicated in the text but does not appear in the figure, the figure only shows SWT. This should be checked. Line 99: based on Figure 1C, the authors are adding MgCl2 to SWT, SWT should be mentioned in this line. Line 100: I believe this is referring to Figure 1C, which should be checked. 

      We appreciate this comment! 

      Line 98, which is now Line 118: We have corrected M9 to ASWT as following:

      “However, the MIC for BLFX was higher in ASWT medium supplemented with Mg2SO4 or MgCl2 than in LB medium (Fig 1B).”

      Line 99, which is now Line 133: the sentence is corrected as following:

      “The MIC for BLFX increased at higher concentrations of MgCl2 in ASWT”

      Line 100, which is now Line 135: we have corrected Fig 1B to Fig. 1C.

      (7) Line 101: text and Figure 1D are not consistent, as Figure 1D does not show this level of precision in added MgCl2 as indicated in the text (15.6 - 62.4 mM).  

      We appreciate this comment! The sentence has been corrected as following: “At balofloxacin doses of 1.56, 3.125, 6.25, and 12.5 µg, the zone of inhibition decreased with increasing MgCl2 (Fig 1D)””.  

      (8) MgCl2 clearly induces increasing levels of BLFX resistance, and to high levels, but not for every antibiotic. For example, the level of increased resistance to blactams is low (ceftriaxone) and plateaus (ceftazidime). As well, resistance to gentamicin plateaus at a lower level than the other aminoglycosides. These observations do not take away from the conclusion that Mg induces multi-drug resistance, but since the behaviour of the MICs for these drugs is different than the other drugs, they should be mentioned. Also, Figure 1F - tetracyclines (plural) is used for vertical axis label - does this refer to the tetracycline itself or the class itself, and if the class, which one was tested? 

      We appreciate this comment! We revise the description as following: “Notably, magnesium had a reduced effect on ceftriaxone and gentamicin than other antibiotics.”

      The tetracyclines is labeled as “Oxytetracycline” in the revised manuscript. 

      - The magnesium chelation experiments presented in Figure 2 are not clear. The authors should briefly mention how this was done around line 128, and what data underlies the values in Figure 2C. Figure 2B is also not clear to me at all. Similarly, how the authors measured intracellular balofloxacin and Mg2+ is not clear and should be mentioned briefly around lines 130-132. 

      We appreciate this comment! These have been rewritten following as  “To investigate whether magnesium binds to balofloxacin, balofloxacin was preincubated with magnesium, and zone of inhibition (ZOI) analysis was conducted. Six different concentrations of balofloxacin were separately incubated with six different concentrations of MgCl2, and then spotted on filter paper so that a defined amount of balofloxacin could be used for ZOI. While lower concentrations of MgCl2, (0.78, 3.125, or 12.5 mM) did not alter the ZOI, higher concentrations, including 50 and 200 mM MgCl2, decreased the ZOI (Suppl. Fig 2A), suggesting that even high doses of magnesium had only a partial effect on balofloxacin through direct binding. For example, at 200 mM MgCl2 and 5 or 10 μg/mL balofloxacin, the balofloxacin ZOI was 53.2 and 70.3% of the ZOI at 0 mM MgCl2, suggesting that  50% of the antibiotics were still functional. Intracellular BLFX also decreased with increasing MgCl2 (Suppl. Fig 2B), while exogenous Mg2+ increased intracellular Mg2+ levels in a dose-dependent manner. For example, exogenous 50 and 200 mM MgCl2 increased intracellular Mg2+ levels to 1.21 and 1.31 mM, respectively (Suppl. Fig 2C). The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      - Line 135: LPS cannot be "expressed", as the authors word it here. This should be corrected. Also, the inspection of Figure 2G actually shows the levels of LPS increase with increased Mg2+. The authors should re-evaluate these results and change their description around this area of the Results. 

      We appreciate this comment! We have removed the whole Figure 2 to Supplementary Text and Supplementary Figure 2. We rewrite this part as following: “The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      - Section: MgCl2 affects bacterial metabolism. Authors switched to M9 medium - why? This contrasts with other sections using SWT and should be explained. Also, I cannot evaluate whether the statistical analysis of the data here was performed correctly and was appropriate for this type of experiment. I advise the authors to move the details in lines 166-169 to the Materials and Methods and replace this section instead with a more accessible description of the statistical analysis that a non-expert would be able to appreciate. Furthermore, analysis of Figure 3A indicates that the levels of asparagine, 4-hydroxybutyric acid, uracil, cystathionine, fumaric acid, and aminoethanol have significantly changed at high MgCl2, but these are not mentioned in the text. I suggest the authors mention these if they are relevant to the 12 enriched pathways, especially the biosynthesis of fatty acids. 

      We appreciate this comment! 

      We indicate the reason we use M9 medium as following:

      “To better understand how magnesium affects bacterial metabolism” for explaining why the M9 medium was used.”

      The information lines 166-169 indicated has been removed to M &M. 

      We have carefully examined the abundance of the metabolites and the enriched pathway. Among the listed metabolites, only fumarate is within the enriched pathways. We mention this point in our revised manuscript as following:

      “The increase in fatty acid biosynthesis could be partially explained by an imbalanced pyruvate cycle/TCA cycle, in which fumarate levels increased at higher Mg2+ while succinate levels increased at lower Mg2+ (Suppl. Fig 5B). These findings indicated that glycolysis fluxes into fatty acid biosynthesis rather than the pyruvate cycle/TCA cycle. The relevance of fatty acids and BLFX was demonstrated by the observation that exogenous palmitic acid increased bacterial resistance to balofloxacin (Fig 2F). These results suggest that fatty acid metabolism may be critical to magnesium-based phenotypic resistance.”

      - Line 211 appears to refer to Figure 4F and should be checked. Similarly in line 216 - appears this should be Figure 4H, and line 218 should be Figure 4H. Line 226: add a reference to Fig 4I (after arcA was decreased). Line 227: what are genes N646_1004 and N646_1885? Based on Fig 4J these are crp - authors should add to line 227. Line 228 appears to refer to Figure 4J, not Figure 4I. Line 229 - should be Figure 4K, not Figure 4I. Line 231 - should be 4L, not 4K. Line 239 - should be 4M.

      We appreciate this comment! The text and figure is now matched. 

      - Line 312: the descriptions of "11 lipids, 32 lipids, and 53", and then "26 lipids, 52 lipids, and 107 lipids" are not clear at all and should be corrected. 

      We appreciate this comment! The sentence is revised as following:

      “The abundance of 11, 32, and 53 lipids was increased in 3.125, 50, and 200 mM MgCl2-treated bacteria, respectively, while the abundance of 26, 52, and 107 lipids was decreased in 3.125, 50, and 200 mM MgCl2-treated bacteria, respectively (Suppl. Fig 7C)”

      - Line 340. What is the assay the authors are using to measure the levels of the PGS and PSS enzymes? This is not mentioned or clear in this part of the Results.  

      We appreciate this comment!  We provide the information in the manuscript as following:

      “Levels of PGS and PSS were quantified by ELISA kits according to manufacture’s instruction (Shanghai Fusheng Industrial Co., Ltd., China)”

      - Line 372: What is the assay for measuring membrane depolarization? This is not mentioned and I suggest it should be. Line 374: Figure 7B does not show time dependence, only dose dependence, this should be corrected, it is assumed the authors are referring to Fig 7C for the time dependence data. 

      We appreciate this comment! We provide the information in the result as following:  

      “The voltage-sensitive dye, DiBAC4(3) showed that 12.5–200 mM MgCl2 promoted membrane depolarization in a dose-dependent manner (Fig 6A)”

      We also explain how DiBAC4(3) can be used to measure membrane depolarization in the Materials and Methods section as following:

      “DiBAC4(3) is a s voltage-sensitive probe that penetrates depolarized cells, binding intracellular proteins or membranes exhibiting enhanced fluorescence and red spectral shift.”

      To make it clear the specific figure, we revise the sentence as following:

      “Meanwhile, MgCl2 had a dose-dependent (Fig 6B) and time-dependent (Fig 6C) effect on proton motive force (PMF).”

      - Line 384: mention how FM5-95 measures membrane permeability. The authors should also clarify how this reagent is used to measure membrane fluidity, and it is not clear if the data for this is presented in Figure 7 - please clarify. Regarding SYTO9 dye experiment: the authors should briefly explain the experimental design - how SYTO9 dye operates and why FACS was chosen. What is labeled with FITC?  

      We appreciate this comment! We clarify the reason we use FM5-95 in the Methods and Materials section as following:

      “Measurement of fluidity by fluorescence microscopy

      Measurement of membrane fluidity is performed as previously described (Wen et al., 2022). Briefly, ATCC33787 were cultured in medium with indicated concentrations of MgCl2, collected and then adjusted to OD 0.6. Aliquot of 100 μL bacteria cells of each sample were diluted to 1 mL and 10 μL (10 mg/mL) FM5-95 (Thermo Fisher

      Scientific, USA) was added. FM5-95 is a lipophilic styryl dye that insert into the outer leaflet of bacterial membrane and become fluorescence. This dye preferentially bind to the microdomains with high membrane fluidity(Wen et al., 2022). After incubated for 20 min at 30 ℃ at vibration without light, the sample was centrifuged for 10 min at 12,000 rpm. The pellets were resuspended with 20 μL of 3% NaCI. Aliquot of 2 μL sample was dropped on the agarose slide, and take photos under the inverted fluorescence microscope.”

      This data is presented as micrographs in Fig. 6D, which shows the decreased FM5-95 staining with increasing concentrations of MgCl2. We make this description clear with the following revision:

      “FM5-95 staining decreased with increasing concentrations of Mg2+, and no staining was observed in the presence of 200 mM Mg2+ (Fig 6D).”

      We explain the reason why we use SYTO9 as following:

      “SYTO9, a green fluorescent dye that binds to nucleic acid, enters and stains bacteria cells when there is an increase in membrane permeability (Lehtinen et al., 2004; McGoverin et al., 2020). Staining decreased with increasing MgCl2, indicating that bacterial membrane permeability declined in an Mg2+ dose-dependent manner (Fig 6E).”

      We didn’t use FACS in this study, while we only analyze the fluorescence distribution with the equipment. To make it clear, we revise the sentence as following:

      “After incubated for 15 min at 30 ℃ at vibration without light, the mixtures were filtered and measured by flow cytometry (BD FACSCalibur, USA).”

      - Lines 391-397. The statement that palmitic acid shifts the peaks in Figure 7F is not supported by the data. There is essential no change in the major peak position within each MgCl2 concentration set with increasing palmitic acid. For the linolenic acid data, it is clear that linolenic acid increases permeability only at 50 mM MgCl2-this should be mentioned in the text. 

      We appreciate this comment! We revise the sentence as following:

      “Exogenous palmitic acid also shifted the fluorescence signal peaks to the left in an MgCl2-dependent manner while palmitic acid only slightly shifted the peaks (Fig 6F). In contrast, exogenous linolenic acid shifted the peak to the right in a dose-dependent manner at 50 mM MgCl2 (Fig 6G).” 

      - Line 404-405 - as mentioned earlier, the assay for the update of BLFX should be mentioned (if it is done so earlier in the text, then it does not need to be here).  

      We appreciate this comment! It has been mentioned in the introduction.  

      - Discussion: CpxA/R-OmprF pathway is mentioned here for the first time. Is this one of the pathways modified by MgCl2 as determined during the course of the study? If so, this should be reworded to mention that. If not, the relevance of this particular pathway as it relates to light metals and phenotypic resistance should be discussed.

      We appreciate this comment! Since it is not relevant to the discussion of Mg2+ and fatty acid biosynthesis, we delete this sentence in the revised manuscript.  

      -The following grammatical errors should be corrected:

      -line 55 change to: "genetic mutations; instead, this type of resistance is transient, and bacteria resume normal growth"

      -line 57: change to "resistance types are biofilm" 

      -line 61: change to "states that significantly" 

      -line 63: change to "resistance share the common feature in they retard or even cease in the presence" 

      -line 65: change to "resistance that allow bacteria to proliferate" 

      -line 81: change "But whether" to "Whether" 

      -line 178: change to "may be critical to the Mg-based phenotypic resistance"

      -line 86: change to "Marine environments and agriculture are rich in magnesium, where..." 

      -line 93: change in to vs

      -line 154: insert space after metabolism 

      -line 158: change 'identified" to "focused on the levels of" 

      -line 160: change "The levels of forty-one metabolites" 

      -line 198: change shared to share 

      -line 310: increased is duplicated, delete one 

      -line 451: add "the" before ratio 

      -line 453: gram should be capitalized 

      -line 462: "the regulation" should be reworded to "More importantly, the effect of exogenous MgCl targets the..." 

      -line 469: add dash between Mg2+ and limited

      -line 478: change "the crucial" to "a crucial" 

      -there are numerous locations in the manuscript where the word "magnetism" is used when clearly the word is supposed to be magnesium - this should be corrected

      We appreciate this comment! These have been corrected or revised. 

      Editors comments:

      Page 2 line 27; Page 25 line number 426; page 27 line number 481: In the abstract and discussion, only Vibrio alginolyticus was mentioned, even though two Vibrio species were used in the study. It would be helpful to understand the rationale behind the focus on this particular species.

      We appreciate this comment! We have revised the introduction to provide additional information as following:

      “Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesium-dependent phenotypic resistance mechanisms of pathogenic Vibrio species.”

      On Page 2, line 34: The abstract contains some undefined abbreviations, such as 'PE' and 'PG', which should be explained. 

      We appreciate this comment! We explain the PE and PG in the revised abstract as following:

      “phosphatidylethanolamine (PE) biosynthesis is reduced and phosphatidylglycerol (PG)”

      On Page 2, line 31-32: For the statement "Exogenous supplementation of fatty acids confirm the role of fatty acids in antibiotic resistance…" it would be beneficial to specify whether the fatty acids were saturated or unsaturated. 

      Response, We appreciate this comment! We revise the sentence as following:

      “Exogenous supplementation of unsaturated and saturated fatty acids increased and decreased bacterial susceptibility to antibiotics, respectively, confirming the role of fatty acids in antibiotic resistance.”

      The potential effects of the specific ions (SO4 and Cl2) present in the Mg2SO4 and MgCl2 compounds used in the study were not discussed. It would be useful to understand if these ions had any influence on the observed outcomes.

      We appreciate this comment! We revise the sentence as following:

      “However, the MIC for BLFX was higher in ASWT medium supplemented with Mg2SO4 or MgCl2 than in LB medium (Fig 1B). And Mg2SO4 or MgCl2 had no

      difference on MIC, suggesting it is Mg2+ not other ions contribute to the MIC change.”

      On Page 8, line 141: The heading of Figure 2, "Mg2+ elevates intracellular Mg2+," seems redundant and could be revised for clarity or modified. 

      We appreciate this comment! Figure 2 is now moved to supplementary figure as Suppl. Fig 2. The title is revised as following:

      “Figure 2. Mg2+ decreases balofloxacin uptake.”

      On Page 4, line 91: some terms/abbreviations, such as 'LB' and 'M9,' require expansion or definition to ensure the reader's understanding.

      We appreciate this comment! We include the expansion for LB and M9 in the  revised manuscript as following:

      “Luria-Bertani medium (LB medium)” and “M9 minimal medium (M9 medium)”

      Page 4, line 92: The real seawater composition used in the experiments should be supported by a reference.

      We appreciate this comment! We provide the reference in the revised manuscript as following:

      ““artificial seawater” (ASWT) medium that included the major ion species in marine water (Wilson, 1975) (LB medium plus 210 mM NaCl, 35 mM Mg2SO4, 7 mM KCl, and 7 mM CaCl2)”

      Page 4 line, number 93: the he full names of the bacterial strains (e.g., ATCC33787 and VP01) should be provided instead of just the strain numbers.

      We appreciate this comment! We revised the sentence as following:

      “To investigate whether this mineral impacts antibiotic activity, the minimal inhibitory concentration (MIC) of V. alginolyticus ATCC33787 and V. parahaemolyticus VP01, which we referred as ATCC33787 and VP01 afterwards,”

      Finally, there appears to be a potential contradiction between the statements on page 12, lines 211-212 and 214-216, regarding the effects of Mg2+ on the synthesis of unsaturated fatty acids. Further explanation may be needed to reconcile these seemingly contradictory points.

      We appreciate this comment! For line 221-226, which was previously line 211-212, is about the gene expression for fatty acid biosynthesis. While, Line 228 and 233, which was previously line 214-216 is about the gene expression for fatty acid degradation. We agree that the previous description is a little bit confuse. We revise the sentence to emphasize that we focus on fatty acid degradation so that the readers can tell them apart. 

      In the text, we revised it as following:

      “In addition, we also quantified gene expression during fatty acid degradation to determine whether Mg2+ affects this process”  In the figure legend, we also indicate that 

      “H. qRT-PCR for the expression of genes encoding fatty acid degradation in the absence or presence of the indicated concentrations of MgCl2”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Homan et al used mouse models of Metabolic Dysfunction-Associated Steatotic Liver Disease and different specific target deletions in cells to rule out the role of Complement 3a Receptor 1 in the pathogenesis of disease. They provided limited evidence and only descriptive results that despite C3aR being relevant in different contexts of inflammation, however, these tenets did not hold true.

      Weaknesses:

      (1) The results are based on readouts showing that C3aR is not involved in the pathogenesis of liver metabolic disease.

      (2) The description of the mouse models they used to validate their findings is not clear. Lysm-cre mice - which are claimed to delete C3aR in (?) macrophages are not specific for these cells, and the genetic strategy to delete C3aR in Kupffer cells is not clear.

      (3) Taking this into account, it is very challenging to determine the validity of these data, also considering that they are merely descriptive and correlative.

      We generated 2 different cohorts of mice using LysM-Cre (Jackson Strain #004781) to drive deletion in all macrophages and Clec4f-Cre (Jackson Strain #033296) to specifically ablate C3ar1 in Kupffer cells. These experimental models have been clearly defined in the revised manuscript on pages 5 and 7 and in the methods section (page 10). The reviewer’s point is well taken that the LysM-Cre transgene can also be active in granulocytes and some dendritic cells. Even so, despite deletion of C3ar1 in macrophages and other granulocytes, we do not see a major effect on hepatic steatosis and fibrosis in this GAN diet induced model of MASLD/MASH. This was a somewhat surprising finding. We do not agree that our findings are correlative. We specifically ablated C3aR1 in macrophages or Kupffer cells and found no significant differences in the major readouts of steatosis and fibrosis for MASLD/MASH between control and knockout mice. It is possible that in other models of liver injury that we did not test (e.g., short-term treatment with a hepatotoxin such as carbon tetrachloride), there may be differences in liver injury in mice lacking C3ar1 in macrophages, but the GAN diet model has been shown to better parallel the gene expression changes in human MAFLD/MASH. This has been added to the discussion (page 9).

      Reviewer #2 (Public review):

      Summary:

      Homan et al. examined the effect of macrophage- or Kupffer cell-specific C3aR1 KO on MASLD/MASH-related metabolic or liver phenotypes.

      Strengths:

      Established macrophage- or Kupffer cell-specific C3aR1 KO mice.

      Weaknesses:

      Lack of in-depth study; flaws in comparisons between KC-specific C3aR1KO and WT in the context of MASLD/MASH, because MASLD/MASH WT mice likely have a low abundance of C3aR1 on KCs.

      Homan et al. reported a set of observation data from macrophage or Kupffer cell-specific C3aR1KO mice. Several questions and concerns as follows could challenge the conclusions of this study:

      (1) As C3aR1 is robustly repressed in MASLD or MASH liver, GAN feeding likely reduced C3aR1 abundance in the liver of WT mice. Thus, it is not surprising that there were no significant differences in liver phenotypes between WT vs. C3aR1KO mice after prolonged GAN diet feeding. It would give more significance to the study if restoring C3aR1 abundance in KCs in the context of MASLD/MASH.

      GAN diet feeding resulted in higher liver C3ar1 compared to regular diet (Figure 1H). This thus became an impetus for studying the effects of C3ar1 deletion in macrophages or Kupffer cells, which are responsible for the majority of liver C3ar1 expression, in MASLD/MASH (Figures 2B and 3H). This point has been added to the text on page 5.

      (2) Would C3aR1KO mice develop liver abnormalities after a short period of GAN diet feeding?

      We did not assess if short term GAN diet feeding resulted in significant differences in liver abnormalities in the C3ar1 macrophage or Kupffer cell knockout mice. Perhaps the reviewer’s point is that perhaps with shorter periods of GAN diet feeding there may be a phenotype in the KO mice. We agree that this is entirely possible, though with shorter feeding timeframes what is typically seen is hepatic steatosis without fibrosis. Nevertheless, the most important element in our opinion for a disease preventing or modifying model lies with the longer-term GAN diet feeding. With long term GAN diet feeding that has been previously shown to model human MASLD/MASH, we did not observe significant differences in liver abnormalities with the KO mice. This has been added to the discussion (page 8).

      (3) What would be the liver macrophage phenotypes in WT vs C3aR1KO mice after GAN feeding?

      Similar to the above point, given the lack of a major MASLD/MASH phenotype in hepatic steatosis and fibrosis, we did not further profile the liver macrophage profiles of the macrophage or Kupffer cell C3ar1 KO mice with GAN feeding.

      (4) In Fig 1D, >25wks GAN feeding had minimal effects on female body weight gain. These GAN-fed female mice also develop NASLD/MASH liver abnormalities?

      We thank the reviewer for this question. In general, female GAN-fed mice develop milder MASLD/MASH abnormalities. We have included additional data in the revised manuscript in Figure S4. These results show no to minimal development of a MASLD/MASH gene signature.

      (5) Would C3aR1KO result in differences in liver phenotypes, including macrophage population/activation, liver inflammation, lipogenesis, in lean mice?

      We have provided additional data further characterizing liver inflammation, lipogenesis and macrophages in macrophage C3ar1 KO mice under lean/regular diet conditions in Figure 2K. These results show a potential trend but no substantial development of a MASLD/MASH gene signature.

      (6) The authors should provide more information regarding the generation of KC-specific C3aR1KO. Which Cre mice were used to breed with C3aR1 flox mice?

      Clec4f-Cre transgenic mice were used to generate Kupffer cell specific KO of C3ar1. This has been clarified and explicitly stated in the revised manuscript on page 7 and in the methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      These data should be repeated using a more established model of Kupffer cell target deletion via Clec4-F mice.

      Our data with Kupffer cell C3ar1 deletion is indeed done with Clec4f-Cre transgenic mice. This has been clarified in the revised manuscript on page 7 and in the methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Typo: "iver" in the abstract

      (2) Line 97, "GAN diet I" should be "GAN diet"?

      These points have been corrected in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      Recent years have seen spectacular and controversial claims that loss of function of the RNA splicing factor Ptbp1 can efficiently reprogram astrocytes into functional neurons that can rescue motor defects seen in 6-hydroxydopamine (6-OHDA)-induced mouse models of Parkinson's disease (PD). This latest study is one of a series that fails to reproduce these observations, but remarkably also reports that neuronal-specific loss of function of Ptbp1 both induces expression of dopaminergic neuronal markers in striatal neurons and rescues motor defects seen in 6-OHDA-treated mice. The claims, if replicated, are remarkable and identify a straightforward and potentially translationally relevant mechanism for treating motor defects seen in PD models. However, while the reported behavioral effects are strong and were collected without sample exclusion, other claims made here are less convincing. In particular, no evidence that Ptbp1 loss of function actually occurs in striatal neurons is provided, and the immunostaining data used to claim that dopaminergic markers are induced in striatal neurons is not convincing. Furthermore, no characterization of the molecular identity of Ptbp1-deficient striatal neurons is provided using single-cell RNA-Seq or spatial transcriptomics, making it difficult to conclude that these cells are indeed adopting a dopaminergic phenotype. 

      Overall, while the claims of behavioral rescue of 6-OHDA-treated mice appear compelling, it is essential that these be independently replicated as soon as possible before further studies on this topic are carried out. Insights into the molecular mechanisms by which neuronalspecific loss of function of Ptbp1 induces behavioral rescue are lacking, however. Moreover, the claims of induction of neuronal identity in striatal neurons by Ptbp1 require considerable additional work to be convincing.

      We thank the reviewer for the detailed analysis of our study. Please find our answers to the points raised by the reviewer below in blue.

      Strengths of the study: 

      (1) The effect size of the behavioral rescue in the stepping and cylinder tests is strong and significant, essentially restoring 6-OHDA-lesioned mice to control levels.

      (2) Since the neurotoxic effects of 6-OHDA treatment are highly variable, the fact that all behavioral data was collected blinded and that no samples were excluded from analysis increases confidence in the accuracy of the results reported here. 

      We appreciate the reviewer’s feedback and acknowledgement of the strengths of our study. We undertook several optimization steps in the surgery, post-operative care, and handling of the animals for behavior experiments to ensure high reproducibility of our experiments.

      Weaknesses of the study:  

      (1) Neurons express relatively little Ptbp1. Indeed, cellular expression levels as measured by scRNA-Seq are substantially below those of astrocytes and other non-neuronal cell types, and Ptbp1 immunoreactivity has not been observed in either striatal or midbrain neurons (e.g. Hoang, et al. Nature 2023). This raises the question of whether any recovery of Th expression is indeed mediated by the loss of function of Ptbp1 rather than by off-target effects. AAVmediated rescue of Ptbp1 expression could help clarify this.

      In the original manuscript, we delivered control vectors that only express the ABE to 6-OHDAlesioned mice (labeled as AAV-ctrl) and did not detect TH positive cells in the midbrain or striatum of control mice or rescue of spontaneous motor skills. We can therefore exclude that the delivery procedure, AAV-PHP.eB capsid, or ABE expression caused adverse effects leading to induction of TH expression and functional rescue of spontaneous motor behaviors in PD mice. To further exclude that these effects were caused by off-target editing, we experimentally determined off-target binding sites of our sgRNA (sgRNA-ex3) using GUIDEseq and subsequently analyzed these sites in treated animals by NGS (Figure 3 – supplement 3). While two off-target sites were identified, it is unlikely that base editing at these sites caused the observed phenotypes. One off-target site was identified in the myopalladin (Mypn) gene, which encodes for a muscle-specific protein that plays a role in regulating the structure and growth of skeletal and cardiac muscle (Filomena et al., 2021, 2020).  The other site is not located in a coding region, but in an intron of the ankyrin-1 (Ank1) gene, encoding for an adaptor protein linking membrane proteins to the underlying cytoskeleton (Cunha and Mohler, 2009). Even though this gene is also expressed in neurons, base editing within this intronic region did not lead to changes in transcript levels (Figure 3 – supplement 3). Thus, the induction of TH expression upon adenine base editing with sgRNA-ex3 is likely a direct consequence of PTBP1 downregulation.

      Further supporting this conclusion, in the revised manuscript we additionally show PTBP1 downregulation at the RNA and protein level in the SNc and striatum after base editor treatment (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      (2) It is not clear why dopaminergic neurons, which are not normally found in the striatum, are observed following Ptbp1 knockout. This is very similar to the now-debunked claims made in Zhou, et al. Cell 2020, but here performed using the hSyn rather than GFAP mini promoter to control AAV expression. While this is the most dramatic and potentially translationally relevant claim of the study, this claim is extremely surprising and lacks any clear mechanistic explanation for why it might happen in the first place.  

      We agree with the reviewer that our study does not provide mechanistic insights into how Ptbp1 downregulation in neurons leads to the induction of dopaminergic markers in the striatum. As we believe that this is not within the scope of a revision, we discuss potential follow-up experiments in the discussion section of the revised manuscript.

      This observation is even more surprising in light of reports that antisense oligonucleotidemediated knockdown of Ptbp1, which should have affected both neuronal and glial Ptbp1 expression, failed to induce expression of dopaminergic neuronal markers in the striatum (Chen, et al. eLife 2022). Selective loss of function of Ptbp1 in striatal and midbrain astrocytes likewise results in only modest changes in gene expression. 

      Using 6-OHDA lesioned Aldh1l1-CreERT2;Rpl22lsl-HA mice, the Chen et al. study (eLife 2022) assessed potential astrocyte to neuron conversion by quantifying the presence of HA-labeled neurons after ASO-mediated knockdown of Ptbp1. Even though they did not detect HApositive neurons in the SNc, suggesting absence of astrocyte to neuron conversion, the images in Figure 4D reveal TH positive cells in the lesioned hemisphere, similar to our observations in Figure 2B-D. While it cannot be excluded that these TH positive cells are remnants from an incomplete 6-OHDA lesion, they could also be endogenous neurons with induced expression of dopaminergic markers after ASO-mediated knockdown of Ptbp1. Furthermore, Chen et al. performed the apomorphine test to assess changes in motor skills, which did not reveal an improvement in our study either.

      It is critically important that this claim be independently replicated, and that additional data be provided to conclusively show that striatal neurons are indeed expressing dopaminergic markers.

      Our behavior and immunofluorescence experiments involving mice injected into the striatum were performed with two independently generated cohorts of 6-OHDA mice. In detail, the 6OHDA mice were generated by two independent surgeons from different labs (>6 months between experiments of these cohorts), leading to comparable behavioral outcomes before and after treatment. Subsequent behavior and immunofluorescence experiments with each cohort were performed and analyzed by two independent and blinded researchers, showing comparable results.

      (3) More generally, since multiple spectacular and irreproducible claims of single-step glial-toneuron reprogramming have appeared in high-profile journals in recent years, a consensus has emerged that it is essential to comprehensively characterize the identity of "transformed" cells using either single-cell RNA-Seq or spatial transcriptomics (e.g. Qian, et al. FEBS J 2021; Wang and Zhang, Dev Neurobiol 2022). These concerns apply equally to claims of neuronal subtype conversion such as those advanced here, and it is essential to provide these same datasets. 

      In the revised version, we have analyzed the expression of additional neuronal markers in TH positive cells of the striatum using 4i imaging. Briefly, our results showed that the vast majority of TH-expressing cells also expressed the markers DAT and NEUN, further corroborating the neuronal and dopaminergic identity of these cells. Additional analysis revealed that this TH/DAT/NEUN expressing cell population expressed markers of GABAergic neurons, either of medium spiny neurons (~50%) and various types of interneurons (~50%). While our 4i analysis has allowed us to broadly classify these TH-expressing populations, we agree that detailed transcriptional analysis at the single cell level is required to understand the molecular mechanisms underlying the generation of TH positive cells. These analyses are, however, not within the scope of a revision and would require a thorough dedicated study. We have added these results and discussion points to the revised manuscript.

      (4) Low-power images are generally lacking for immunohistochemical data shown in Figures 3 and 4, which makes interpretation difficult. DAPI images in Figure 3C do not appear nuclear. Immunostaining for Th, DAT, and Dcx in Figure 4 shows a high background and is difficult to interpret. 

      We thank the reviewer for closely evaluating these images and suggestions for improvement. In the revised manuscript, we provide low power images and higher magnification insets as requested to allow for easier interpretation.

      (5) Insights into the mechanism by which neuronal-specific loss of Ptbp1 function induces either functional recovery, or dopaminergic markers in striatal neurons, is lacking.

      In the revised manuscript, we provide a more detailed discussion of mechanisms that could potentially be involved in the functional recovery or expression of dopaminergic markers. However, deciphering the exact molecular mechanisms underlying these observations requires thorough transcriptional analysis at the single cell level, which is out of scope of this revision.

      Reviewer #2 (Public Review):

      Summary: 

      The manuscript by Bock and colleagues describes the generation of an AAV-delivered adenine base editing strategy to knockdown PTBP1 and the behavioral and neurorestorative effects of specifically knocking down striatal or nigral PTBP1 in astrocytes or neurons in a mouse model of Parkinson's disease. The authors found that knocking down PTBP1 in neurons, but not astrocytes, and in striatum, but not nigra, results in the phenotypic reorganization of neurons to TH+ cells sufficient to rescue motor phenotypes, though insufficient to normalize responses to dopaminomimetic drugs.

      Strengths: 

      The manuscript is generally well-written and adds to the growing literature challenging previous findings by Qian et al., 2020 and Zhou et al., 2020 indicating that astrocytic downregulation of PTBP1 can induce conversion to dopaminergic neurons in the midbrain and improve parkinsonian symptoms. The base editing approach is interesting and potentially more therapeutically relevant than previous approaches.

      Weaknesses: 

      The manuscript has several weaknesses in approach and interpretation. In terms of approach, the animal model utilized, the 6-OHDA model, though useful to examine dopaminergic cell loss, exhibits accelerated neurodegeneration and none of the typical pathological hallmarks (synucleinopathy, Lewy bodies, etc.) compared to the typical etiology of Parkinson's disease, limiting its translational interpretation. 

      We thank the reviewer for the detailed assessment of our study and pinpointing its current weaknesses. Please find our answers to all comments below in blue.

      We agree with the reviewer that the 6-OHDA model lacks the typical pathological hallmarks of PD. Nevertheless, we chose this model for two reasons:

      i) The 6-OHDA model was used by both Qian et al. (2020) and Zhou et al. (2020). To allow comparison of our results to these studies, it was crucial to use the same model. Notably, the 6-OHDA model was also used by Chen et al. (2022) and Hoang et al. (2023) for comparison to the two studies from 2020.

      ii) The 6-OHDA model is straightforward to generate and displays robust motor impairments for evaluation of potential therapeutic effects of neuroregeneration treatment approaches. We therefore believe that the model is well-suited to analyze the cellular and behavioral effects (specifically motor skills) of PTBP1 downregulation. 

      In future studies, it would be critical to include models that also display typical pathological hallmarks of the disease to further evaluate the therapeutic effect of this base editing approach. These experiments are, however, not within the scope of this study, which was aimed to focus on the cellular and behavioral effects of PTBP1 downregulation. 

      In addition, there is no confirmation of a neuronal or astrocytic knockdown of PTBP1 in vivo; all base editing validation experiments were completed in cell lines. 

      In the revised manuscript, we assess in vivo base editing efficiencies at the Ptbp1 target site in the SNc (AAV-hsyn, 15.6%) and striatum (AAV-hysn, 21.1%). Furthermore, we assessed in vivo Ptbp1 downregulation at the RNA and protein level to complement our in vitro data (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      Finally, it is unclear why the base editing approach was used to induce loss-of-function rather than a cell-type specific knockout, if the goal is to assess the effects of PTBP1 loss in specific neurons. 

      We expressed base editors under cell-type specific promoter to induce a reliable loss-offunction mutation at the Ptbp1 exon-intron junction in neurons or astrocytes. Performing these mutations with Cas9 nucleases instead would have had potential limitations and risks, including i) indel mutations do not always lead to a frameshift and loss-of-function despite high indel formation at the targeted site, ii) nucleases induce DNA double strand breaks, which can have serious side effects (e.g. chromosomal rearrangements or translocations), and iii) ‘mosaicisms’ as edited cells contain different indel mutations, which may result in different effects and thus complicate analysis of the downstream effects. We discuss these points in the revised manuscript.  

      In terms of interpretation, the conclusion by the authors that PTBP1 knockdown has little likelihood to be therapeutically relevant seems overstated, particularly since they did observe a beneficial effect on motor behavior. We know that in PD, patients often display negligible symptoms until 50-70% of dopaminergic input to the striatum is lost, due to compensatory activity of remaining dopaminergic cells. Presumably, a small recovery of dopaminergic neurons would have an outsized effect on motor ability and may improve the efficacy of dopaminergic drugs, particularly levodopa, at lower doses, averting many problematic side effects. Since striatal dopamine was assessed by whole-tissue analysis, which is not necessarily reflective of synaptic dopamine availability, it is difficult to assess whether the ~10% increase in TH+ cells in the striatum was sufficient to improve dopamine function. However, the improvement in motor activity suggests that it was.

      As pointed out by the reviewer, it is difficult to estimate the therapeutic effect and importance of a ~10% increase in TH+ cells for PD patient. Guided by the reviewer’s suggestion, we have included a more in-depth discussion of our results and its potential therapeutic value as well as outstanding questions for future studies in the revised manuscript.

      Reviewer #3 (Public Review):

      This study explores the use of an adenine base editing strategy to knock down PTBP1 in astrocytes and neurons of a Parkinson's disease mouse model, as a potential AAV-BE therapy. The results indicate that editing Ptbp1 in neurons, but not astrocytes, leads to the formation of tyrosine hydroxylase (TH)+ cells, rescuing some motor symptoms.

      Several aspects of the manuscript stand out positively. Firstly, the clarity of the presentation. The authors communicate their ideas and findings in a clear and understandable manner, making it easier for readers to follow. 

      The Materials and methods section is well-elaborated, providing sufficient detail for reproducibility. 

      The logical flow of the manuscript makes sense, with each section building upon the previous one coherently.

      The ABE strategy employed by the authors appears sound, and the manuscript presents a coherent and well-supported argument.

      Positively, some of the data in this study effectively counteracts previous work in line with more recent publications, demonstrating the authors' ability to contribute to the ongoing conversation in the field.

      We thank the reviewer for appreciating the effort we have put into this study. Please find below a point-by-point reply to the weaknesses raised by the reviewer. 

      However, while the in vitro data yields promising results, it may have been overly optimistic to assume that the efficiencies observed in dividing cells will directly translate to in vivo conditions. This consideration is important given the added complexities of vector optimization, different cell types targeted in vitro versus in vivo, as well as unknown intrinsic limitations of the base editing technology. 

      We agree with the reviewer that in vitro base editing efficiencies might not directly translate to in vivo editing outcomes. We therefore assessed in vivo base editing efficiencies at the Ptbp1 locus and PTBP1 downregulation in the striatum and midbrain. Our data revealed that in vivo base editing activity was lower than in our in vitro setting (in vitro: Figure 1; figure 1 – figure supplement 2; in vivo: figure 2 – figure supplement 5; figure 3 – supplement 2). However, we believe that these rates are slightly underestimated since we sequenced DNA isolated from the whole tissue (striatum or SNc) and not from purified astrocytes or neurons. Moreover, we could demonstrate that editing led to a reduction of Ptbp1 transcript and PTBP1 protein level (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      In addition, certain aspects of the manuscript would benefit from a more in-depth and comprehensive discussion rather than being only briefly touched upon. Such a discussion would enhance the relevance of the obtained results and provide the foundation for improvement when using similar approaches.

      Following the reviewer’s suggestion, we included a more in-depth discussion of our results in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors):

      A summary of key recommendations that might improve the eLife assessment in a subsequent submission are provided below, as a guide to help the authors focus on changes that might enhance the strength of evidence (e.g., from "incomplete" to "solid").

      (1) Provide further explanation of the mechanistic relationship between the downregulation of Ptbp1 and TH+ dopaminergic neuron reprogramming. Additional discussion of this topic should also be included.

      (2) Demonstrate proof of editing in the intended targeted cells in vitro and/or in vivo.

      (3) Show evidence of successful Base Editor delivery in vivo.

      (4) Perform a deeper characterization of TH+ cells in vivo and provide a more thorough discussion of the identity of the targeted cells. This may include an exploration of whether TH+ cells detected are TH+ interneurons and/or establish their identity based on transcriptomics or a similar approach.

      (5) Provide better-quality representative images supporting the quantitative data.

      (6) Please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      In the revised manuscript, we provided 1) suggestions of the mechanistic relationship between Ptbp1 knockdown, dopamine synthesis, and the functional rescue of spontaneous behaviors, 2) proof of in vivo base editing and successful base editor delivery, 3) deeper characterization of TH-expressing cells in vivo using 4i imaging, 4) better quality images, and 5) full statistical reporting.  

      Individual Reviewer recommendations for the authors are included below.

      Reviewer #1 (Recommendations For The Authors):

      Confirm loss of Ptbp1 function in infected striatal neurons. Single-cell RNA-Seq or spatial transcriptomic analysis must be performed to characterize the identity of the edited striatal neurons. The quality of the immunostaining in Figures 3 and 4 needs to be improved, and lowpower images provided. Were eLife a conventional journal, I would have insisted on all these being included prior to publication. Please also arrange for independent replication of the behavioral rescue and induction of dopaminergic marker gene expression in the striatum. 

      In the revised manuscript, we confirmed Ptbp1 downregulation at the tissue level in the SNc and striatum by RT-qPCR and western blot and included low-power images for easier interpretation. Additionally, we assessed expression of additional neuronal markers on striatal sections using 4i imaging and found that TH/DAT/NEUN positive populations either expressed markers of medium spiny neurons or interneurons. We have included these results in the revised manuscript.

      Our behavioral and imaging experiments involving mice injected into the striatum were in fact performed with two independently generated cohorts of 6-OHDA mice. In detail, the 6OHDA mice were generated by two independent surgeons from different labs (>6 months between experiments of these two cohorts), leading to comparable behavioral outcomes before and after treatment. The experiments with each cohort were performed and analyzed by two independent and blinded researchers, yielding comparable results. 

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, lines 43-45: This statement is inaccurate. Current treatment strategies do not focus on slowing or halting disease progression. There is currently no accepted therapy that does this. Dopaminergic therapies and deep brain stimulation can compensate for circuitry dysfunction as a result of dopamine cell loss but do not slow the disease. The referenced paper used is older and does not refer to new treatments for PD and is a summary article for a special issue of the Disease Models and Mechanisms journal. Please ensure that all references used are appropriate for the statement they are attached to.

      We thank the reviewer for pointing this out. We have rephrased this statement accordingly and provided an appropriate reference describing current treatment strategies.

      (2) The number of TH+ cells in the intact nigra seems low compared to published data. Suggest a stereological approach may be better than the Abercrombie method.

      Following the reviewer’s suggestion, we re-quantified the number of TH positive cells using a stereological approach (Nv:Vref method). We have included these results in the revised manuscript. 

      (3) Have the authors considered that the striatal TH+ cells could be TH+ striatal interneurons? 

      In the revised manuscript, we performed additional 4i imaging experiments to further analyze the identity of the TH positive cells in the striatum. Briefly, we found that TH/DAT/NEUN positive populations either expressed markers of GABAergic medium spiny neurons or interneurons. We have added these results to the revised manuscript (Figure 4). 

      (4) The Western blot shown in Figure 1 C for C8-D1A has some abnormalities and makes it difficult to judge the bands. Also, for 1B, the legends are difficult to see.

      In the revised manuscript, we have repeated the respective western blot to make interpretation of the bands easier, and adapted the legends in Figure 1B for better visibility.

      (5) Figure 2: Please show representative images for the GFAP-targeted editing.

      Representative images of the GFAP-targeted groups can be found in Figure 2 – figure supplement 3.

      (6) Figure 2, Supplement 3: Please include quantification.

      The quantifications for these images can be found in Figure 2D and 2F. 

      (7) Figure 1, Supplement 2: The gene name in A is misspelled.

      Thank you for point this out. In the revised manuscript, we added the correct gene name.

      (8) Line 267-276: As previously indicated, the statement here is overstated based on the data provided. In addition, the citation provided to justify this claim (Kannari et al., 2000) is an odd choice as the dosage of L-DOPA utilized was not therapeutically relevant (50 mg/kg). A better indication of efficacy would be the return to basal, unaffected levels rather than the fold increase in dopamine levels. A better comparison would be Lindgren et al., 2010 who showed that L-DOPA-treated animals with a physiologically relevant dose (6 mg/kg) that did not induce dyskinesia, showed a return to basal, non-lesioned dopamine levels in the striatum after LDOPA by microdialysis. To really support this claim, the authors would need to use an approach that could measure synaptic dopamine availability, rather than whole-tissue dopamine levels, such as microdialysis, fiber photometry, or an equivalent.

      Following the reviewer’s suggestions, we replaced this reference with Lindgren et al. (2010) and provide a more detailed interpretation of our results and remaining questions for future studies.  

      Reviewer #3 (Recommendations For The Authors):

      Major and minor issues are discussed below by section.

      INTRODUCTION and AIM - Lines 36-73

      - The authors effectively contextualize the aim of their study by providing comprehensive background information on previous research regarding cell 'reprogramming' into dopaminergic neurons in the SNc. However, the introduction lacks contextualization of TH+ cells and PD. For readers who may not be well-versed in the Parkinson's field, understanding the importance of TH (Tyrosine Hydroxylase) may be challenging, since the term "TH+ cells" is mentioned only once by the end of the introduction (line 71), to then become a key element in the entire study.

      - Providing a brief explanation of the role of Tyrosine Hydroxylase in the synthesis of L-DOPA would facilitate the reader's comprehension of why the presence of TH+ cells following Base Editing treatment is relevant.

      - Further elaboration on the relationship between the downregulation of the general RNA binding protein, PTBP1, and the specific dopaminergic-related readout, TH, would improve coherence and strengthen the linkage between the introductory section and the results.

      We thank the reviewer for the constructive suggestions. In the introduction of the revised manuscript, we describe the meaning and importance of TH in the context of dopamine synthesis and PD. Likewise, we briefly outlined the importance of the PTBP1/nPTBP regulatory loops during neuronal differentiation and maturation. 

      RESULTS 

      Result Section 1 - Line 75-109

      - Thorough screening of sgRNAs targeting splice junctions across the Ptbp1 gene in HEPA cells, shows the achievement of high levels of editing (80-90%) with sgRNA-ex3 and sgRNAex7. 

      - The data also indicates that editing translates into significant reductions in ptbp1 expression, along with an increase in the expression of genes repressed by PTBP1.

      - Despite obtaining lower percentages of editing events in N2a neuroblastoma cells and the C8-D1A astroglial cell line, the differential expression levels of ptbp1 and the readout genes remain significant. However, the gRNA screening assay is performed in immortalized, dividing cells. 

      - Providing proof that Adenosine Base Editing of Ptbp1 is successful in non-dividing cells (such as SNc and/or striatal primary neurons) would strengthen the case for the potential therapy in the intended cell type.

      Following the reviewer’s comment, we show in vivo base editing rates in the SNc and striatum of treated PD mice in the revised manuscript (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      - Moreover, assessing the expression levels of tyrosine hydroxylase by qPCR after Ptbp1 base editing in vitro could help contextualize the use of TH+ detection as an in vivo readout and may help explain why the total number of TH+ cells is low after ABE treatment in vivo - as shown in following sections.

      In the revised manuscript, we now provide quantifications of in vivo base editing efficiencies in the SNc (~15%) and striatum (~20%). As expected from these lower in vivo base editing rates, downregulation of Ptbp1 at the transcript and protein level was less pronounced compared to our in vitro experiments. It seems likely that higher base editing efficiency and more pronounced downregulation of Ptbp1 could lead to a larger population of TH expressing cells. We have added these results and interpretations to the revised manuscript.

      - Furthermore, although ABEs are less prone to generating bystander and other nucleotide changes compared to CBEs, it is still possible. Figures 1 (line 811) and 1-supplement 2 (line 842) only show a brief window of the Sanger sequencing trace. Updating these figures to display a wider view of the sequencing trace would enhance transparency. If unwanted edits are detected, while they may not significantly alter the relevance, impact, or structure of the paper, they may become an important aspect of the discussion. 

      Indeed, ABEs can induce bystander edits and we also detected such edits at the Ptbp1 target site. However, since our base editing strategy was designed to yield a loss of Ptbp1 function, bystander editing at the splice site was not a primary focus in our analysis. Nevertheless, we included CRISPResso output images showing the specific editing outcomes in a wider analysis window in the revised manuscript (Figure 3 – figure supplement 2). 

      Result Section 2 - Lines 110-159

      A split intein system is used in vivo with sgRNA-ex3, after updating the promoter to make it cell-specific: hSyn to restrict expression to neurons and GFAP to restrict expression to astrocytes. 

      However, no other assay is performed to assess whether a) the promoter change and/or b) splitting Cas9 may affect the editing efficiency compared to their initial in vitro approach.

      In the revised manuscript, we assessed the performance of the in vivo AAV vectors encoding the split intein ABE with sgRNA-ex3 in vitro in N2a and C8-D1A cells. Our results show that all vectors are functional and result in base editing at the target locus.

      -  Addressing whether this is the case may explain the low number of TH+ cells observed in vivo. 

      - The authors could also consider staining for Cas9 to address whether the low number of TH+cells could be attributed to a poor Cas9 delivery.

      To confirm successful in vivo base editor delivery, we quantified in vivo base editing efficiencies in the SNc and striatum of PD mice. Our analysis revealed in vivo base editing efficiencies at both tissue sites, confirming that base editors were successfully delivered. Editing efficiencies were, however, substantially lower (Figure 2 – figure supplement 5; figure 3 – supplement 2).  than in our in vitro cell line setting (Figure 1; figure 1 – figure supplement 2). Even though tissue editing rates likely underestimate the cell type-specific editing rates in astrocytes or neurons, higher base editing rates would have likely resulted in a higher number of TH positive cells. We have added these results and their implications to the revised manuscript. 

      -  Moreover, despite the presence of TH, in Figure 2 E,F authors examine the striatal innervation from newly generated TH+ cells in the SNc by Fluorescence Intensity (FI) to conclude that the edited cells do not form projections towards the striatum. Considering the low levels of TH+ positive cells obtained, the accumulation of gross FI might not be the most accurate way to assess the presence or absence of cell projections.

      - Using another marker that stains the projections rather than the cell soma, and that is a marker of dopaminergic neurons, might be a better way to address this.

      To address the reviewer’s comment, we analyzed the presence of potential dopaminergic fibers in the mfb, where projections are more concentrated (around the injection coordinates of 6-OHDA), using the dopaminergic marker DAT. In line with our previous observations in the striatum, we did not detect an increase in DAT fluorescence intensity upon treatment on the lesioned hemisphere (Figure 2 – figure supplement 4).  

      Result Section 3 - Line 160-182

      Minor issue

      - The same dual split intein system is used in the striatum. However, in Figure 3 - Figure Supplement 1 - line 958 and in Figure 3 - Figure Supplement 4 - line 1000authors show the injection of 2x the viral genomes indicated along the manuscript. In previous experiments the SNc 2x108vg/animal was used whereas this figure shows 4x108vg/animal injected in the striatum. 

      - The authors should clarify if the vg injected in the striatum was different from what they previously indicated.

      Compared to injection in the SNc, the volume of vector injected in the striatum was doubled since the region is significantly larger. We clarified that the injected vector genomes were different between striatum and SNc in the revised manuscript.

      Result Section 4- Line 183-220

      In this section, the authors thoroughly examine the neuronal nature of TH+ cells through NeuN co-staining and iterative immunofluorescence imaging (4i). BrdU experiments are conducted to determine the origin of these cells, leading to the conclusion that TH+ cells derive from nondividing cells and express the neuronal marker DAT, characteristic of dopamine-producing neurons (DANs). Cell shape of the TH+ cells in the striatum and SNc is also evaluated measuring their Feret's diameter and their cell surface. Authors conclude there's heterogeneity in the TH+ cell population due to the presence of TH+/Neun- as well as differences in cell shape. 

      However, their explanation of this heterogeneity is solely attributed to differences in the microenvironment and lacks further elaboration. Similarly, their observation that almost half the number of TH+ striatal cells after treatment express CTIP2 (Line 213 and Figure 4B), a marker for GABAergic medium spiny neurons, which they state as "interesting" (line 213) is not developed further. Delving deeper into these topics could strengthen the discussion.

      In the revised manuscript, we provided a more in-depth discussion of the 4i imaging results and potential therapeutic implications. Additionally, we suggest follow-up experiments to analyze the identity, function, and molecular mechanisms underlying the expression of TH upon PTBP1 downregulation in future studies. 

      Result Section 5- Line 221-243

      Two drug-free and two drug-induced behavioral tests are conducted in control and treated animals to evaluate the restoration of motor functions following treatment. Consistent with their previous findings, only the treatment targeted to neurons resulted in the restoration of motor functions in drug-free behavioral tests. The rationale behind each test and its evaluation is clearly explained.

      DISCUSSION 

      - In the discussion section, the authors effectively re-examine their results contextualizing their data with previous studies in the field. However, it would be helpful at this point in the manuscript to reconsider the use of the term 'cell reprogramming,' as this study does not involve actual cell reprogramming. The concept "reprograming" entails the process of transforming adult cells into a stem cell-like state, to then differentiate them into a different cell type. As proven in section 4 by a BrdU proliferation assay, the targeted cells are differentiated neurons. Considering BrdU is administered 5 days after ABE treatment, if true cell reprogramming was taking place, there should be evidence of BrdU incorporation. Cell reprogramming or reprograming is mentioned 4 times in the manuscript (line 34, line 54, line 265, line 277). Therefore, using another terminology would be more accurate.

      Following the reviewer’s suggestion, we removed the term “cell reprograming” from the manuscript and rather describe it as induction of TH expression in endogenous neurons.

      - As noted in the comments of section 4, a more thorough discussion about the various possibilities for heterogeneity would enhance the manuscript's contribution to the PD field.

      In the revised manuscript, we provided a more in-depth discussion of the 4i imaging results and potential therapeutic implications. 

      - Despite observing low numbers of TH+ cells, no significant rescue of drug-induced behaviors, and low levels of released dopamine, the authors merely state that these results make the therapy non-viable, but there is no further exploration or discussion. Whether the limitations lie in the ABE strategy itself, such as its efficiency in targeting and editing of differentiated neurons; or if the issues lie on the injection and delivery, is never discussed. A deeper argumentation on the possible underlying reasons for these challenges would greatly enhance the manuscript and contribute to the advancement of ABE therapies in the brain.

      We believe that the efficacy of our base editing approach could be significantly enhanced by optimizing the delivery. Currently, we are using a dual AAV approach to deliver intein-split ABEs. Since this approach relies on the delivery of higher AAV doses to achieve cotransduction of a cell by two different AAVs, the efficiency could be significantly enhanced by using smaller Cas9 orthologues that can be delivered as a single AAV. Furthermore, in this study we performed a single injection into the dorsal striatum to deliver ABE-expressing AAVs. Performing multiple injections into the rostral, medial, and caudal regions of the striatum might allow us to transduce more cells and induce TH expression in a larger population of striatal neurons. We have included these points in the revised manuscript.

      - While drug-induced behaviors are not recovered, the data demonstrates a rescue of spontaneous behaviors. Further discussion on the potential differences in circuitry underlying these variations in behavioral rescue would also enrich the manuscript's discussion.

      In the revised manuscript, we provide suggestions for potential mechanisms involved in the rescue of spontaneous behavior vs. absence of rescue of drug-induced behaviors. 

      FIGURES AND FIGURE SUPPLEMENTS

      General minor issue - low magnification images in the following figures, make it difficult to visualize positive cells in tissue sections: Figure 2; Figure 2- supplement 1; Figure 2 - supplement 3, Figure 3- supplement 1. Adding a higher magnification imaging of positive cells in tissue sections of SNc and striatum might help with the visualization. 

      As suggested by the reviewer, we included higher magnification images in the corresponding figures to improve interpretation of our results.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript involves 11 research vignettes that interrogate key aspects of GnRH pulse generator in two established mouse models of PCOS (peripubertal and prenatal androgenisation; PPA and PNA) (9 of the vignettes focus on the latter model).

      A key message of this paper is that the oft-quoted idea of rapid GnRH/LH pulses associated with PCOS is in fact not readily demonstrable in PNA and PPA mice. This is an important message to make known, but when established dogmas are being challenged, the experiments behind them need to be robust. In this case, underpowered experiments and one or two other issues greatly limit the overall robustness of the study.

      General critiques

      (1) My main concern is that many/most of the experiments were limited to 4-5 mice per group (PPA experiments 1 and 2, PNA experiments 3, 5, 6, 8, and 9). This seems very underpowered for trying to disprove established dogmas (sometimes falling back on "non-significant trends" - lines 105 and 239).

      For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient. 

      It is pertinent to explore the “established dogma”. While there is every expectation that the PNA model should have increased LH pulsatility, in fact there is only a single study (Moore, Prescott et al. 2015) that has shown this. The two other reports that have examined this issue find no change in LH pulse frequency (McCarthy, Dischino et al. 2021 and ours). Hence, we would suggest that expectations rather than evidence presently maintains the PNA “dogma”. For the PPA model, there is in fact not a single paper reporting increased LH pulse frequency.

      (2) Page 133-142: it is concerning that the PNA mice didn't have elevated testosterone levels, and this clearly isn't the fault of the assay as this was re-tested in the laboratory of Prof Handelsman, an expert in the field, using LCMS. The point (clearly made in lines 315-336 of the Discussion) that elevated testosterone in PNA mice has been shown in some but not other publications is an important concern to describe for the field. However, the fact remains that it IS elevated in numerous studies, and in the current study it is not so, yet the authors go on to present GnRH pulse generator data as characteristic of the PNA model. Perhaps a demonstration of elevated testosterone levels (by LCMS?) should become a standard model validation prerequisite for publishing any PNA model data.

      We provide a Table below showing the huge inconsistencies in testosterone levels reported in the PNA mouse model. If anything, these inconsistencies might be explained by age, although again this is very variable between studies. Much the same as the “dogma” related to LH pulsatility in the PNA model, we would question whether there is any robust increase in testosterone levels in this model. There is no question that women with PCOS have elevated testosterone but whether the PNA mouse is a good model for this is debatable. We have noted this caution and the need for further LC-MS studies in the Discussion.

      Author response table 1.

      *Same ELISA used in the current study.

      (3) Line 191-196: the lack of a significant increase in LH pulse frequency in PNA mice is based on measurements using reasonable group sizes (7-8), although the sampling frequency is low for this type of analysis (10-minute intervals; 6-minute intervals would seem safer for not missing some pulses). The significance of the LH pulse frequency results is not stated (looks like about p=0.01). The authors note that LH concentration IS elevated (approximately doubled), and this clearly is not caused by an increase in amplitude (Figure 4 G, H, I). These things are worth commenting on in the discussion.

      We have included the p-value of the LH pulse frequency results and included the relevant discussion.

      (4) An interesting observation is that PNA mice appear to continue to have cyclical patterns of GnRH pulse generator activity despite reproductive acyclicity as determined by vaginal cytology (lines 209-241). This finding was used to analyse the frequency of GnRH pulse generator SEs in the machine-learning-identified diestrous-like stage of PNA mice and compare it to diestrous control mice (as identified by vaginal cytology?) (lines 245-254). The idea of a cycle stage-specific comparison is good, but surely the only valid comparison would be to use machine-learning to identify the diestrous-like stage in both groups of mice. Why use machine learning for one and vaginal cytology for the other?

      As “machine learning-defined” diestrus is based on the control vaginal cytology information, the diestrous mice are in fact defined by the same machine learning parameters. We have now noted this.

      Specific points

      (5) With regard to point 2 above, it would be helpful to note the age at which the testosterone samples were taken.

      We have included the age in the method.

      (6) Lines 198-205 and 258-266: I think these are repeated measures of ANOVA data? If so, report the main relevant effect before the post hoc test result.

      We have included the relevant main effect in the manuscript.

      (7) Line 415: I don't think the word "although" works in this sentence.

      We have changed the wording accordingly.

      (8) Lines 514-518: what are the limits of hormone detection in the LCMS assay?

      These were originally stated in the figure legend but have now been included in the Methods.

      Reviewer #2 (Public Review):

      Summary

      The authors aimed to investigate the functionality of the GnRH (gonadotropin-releasing hormone) pulse generator in different mouse models to understand its role in reproductive physiology and its implications for conditions like polycystic ovary syndrome (PCOS). They compared the GnRH pulse generator activity in control mice, peripubertal androgen (PPA) treated mice, and prenatal androgen (PNA) exposed mice. The study sought to elucidate how androgen exposure affects the GnRH pulse generator and subsequent LH (luteinizing hormone) secretion, contributing to the pathophysiology of PCOS.

      Strengths

      (1) Comprehensive Model Selection: The use of both PPA and PNA mouse models allows for a comparative analysis that can distinguish the effects of different timings of androgen exposure.

      (2) Detailed Methodology: The methods employed, such as photometry recordings and serial blood sampling, are robust and allow for precise measurement of GnRH pulse generator activity and LH secretion.

      (3) Clear Results Presentation: The experimental results are well-documented with appropriate statistical analyses, ensuring the findings are reliable and reproducible.

      (4) Relevance to PCOS: The study addresses a significant gap in understanding the neuroendocrine mechanisms underlying PCOS, making the findings relevant to both basic science and potentially clinical research.

      Weaknesses

      (1) Model Limitations: While the PNA mouse model is suggested as the most appropriate for studying PCOS, the authors acknowledge that it does not completely replicate the human condition, particularly the elevated LH response seen in women with PCOS.

      We agree.

      (2) Complex Data Interpretation: The reduced progesterone feedback and its effects on the GnRH pulse generator in PNA mice add complexity to data interpretation, making it challenging to draw straightforward conclusions.

      We agree.

      (3) Machine Learning (ML) Selection and Validation: While k-means clustering is a useful tool for pattern recognition, the manuscript lacks detailed justification for choosing this specific algorithm over other potential methods. The robustness of clustering results has not been validated.

      Please see below.

      (4) Biological Interpretability: Although the machine learning approach identified cyclical patterns, the biological interpretation of these clusters in the context of PCOS is not thoroughly discussed. A deeper exploration of how these clusters correlate with physiological and pathological states could enhance the study's impact.

      It is presently difficult to ascribe specific functions of the various pulse generator states to physiological impact. While it is reasonable to suggest that Cluster_0 activity (representing very infrequent SEs) is responsible for the estrous/luteal-phase pause in pulsatility, we remain unclear on the physiological impact of multi-peak SEs on LH secretion, even in normal mice (see Vas et al., Endo 2024). Thus, for the moment, it is most appropriate to simply state that pulse generator activity remains cyclical in PNA mice without any unfounded speculation.

      (5) Sample Size: The study uses a relatively small number of animals (n=4-7 per group), which may limit the generalisability of the findings. Larger sample sizes could provide more robust and statistically significant results.

      For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient. Some of the subsequent experiments do have smaller N numbers and we are particularly aware of the progesterone treatment study that only has N=3 for the PNA group. However, as this was sufficient to show a statistical difference we did not generate more mice.

      (6) Scope of Application: The findings, while interesting, are primarily applicable to mouse models. The translation to human physiology requires cautious interpretation and further validation.

      We agree.

      Reviewer #2 (Recommendations For The Authors):

      (1) The validation of clustering results through additional metrics or comparison with other algorithms would strengthen the methodology. Specifically, the authors selected k=5 for k-means clustering without providing an explicit rationale or evidence of exploratory data analysis (EDA) to support this choice. They refer to their previous publication (Vas, Wall et al. 2024), which does not provide any EDA regarding the choice of a number of clusters nor their robustness. The arbitrary selection of "k" without justification can undermine confidence in the clustering results since clustering results heavily depend on "k". The authors also choose to use Euclidean distance as the "numerical measure" setting in the RapidMiner Studio's software without justification given the chosen features used for clustering and their properties. The lack of exploratory analysis to determine the optimal number of clusters, "k", to be considered means that the authors might have missed identifying the true structure of the data. Common cluster robustness methods, like the elbow method or silhouette analysis, are crucial for justifying the number of clusters. An inappropriate choice could lead to incorrect conclusions about the synchronisation patterns of ARN kisspeptin neurons and their implications for the study's hypotheses. Including EDA and other validation techniques (e.g., silhouette scores, elbow method) would have strengthened the manuscript by providing empirical support for the chosen algorithm and settings.

      It is important to clarify that we did not start this exercise with an unknown or uncharacterised data set and that the objective of the clustering was not to provide any initial pattern to the data. Rather, our aim was to develop an unsupervised approach that would automatically detect the onset and existence of the key features of pulse generator cyclicity that were apparent by eye e.g. the estrous stage slowing and the presence of multi-peak SEs in metestrous. As such, our optimization was driven by the data as well as observation while retaining the unsupervised nature of k-means clustering. We started by assessed 10 variables describing all possible features of the recordings and through a process of elimination found that just 5 were sufficient to describe the key stages of the cycle. While we appreciate that the use of multiple different algorithms would progressively increase the robustness of the machine learning approach, it is evident that the current k-means approach with k=5 is already very effective at reporting the estrous cyclicity of the pulse generator in normal mice (Vas et al., Endo 2024). Having validated this approach, we have now used it here to compare the cyclical patterns of activity of PNA- and vehicle-treated mice.

      (2) The data and methods presented in this study could be valuable for the research community studying reproductive endocrinology and neuroendocrine disorders provided the authors address my comments above regarding the application of ML methods. The insights gained from this work could potentially inform clinical research aiming to develop better diagnostic and therapeutic strategies for PCOS.

      Reviewer #3 (Public Review):

      Summary:

      Zhou and colleagues elegantly used pre-clinical mouse models to understand the nature of abnormally high GnRH/LH pulse secretion in polycystic ovary syndrome (PCOS), a major endocrine disorder affecting female fertility worldwide. This work brings a fundamental question of how altered gonadotropin secretion takes place upstream within the GnRH pulse generator core, which is defined by arcuate nucleus kisspeptin neurons.

      Strengths:

      The authors use state-of-the-art in vivo calcium imaging with fiber photometry and important physiological manipulations and measurements to dissect the possible neuronal mechanisms underlying such neuroendocrine derangements in PCOS. The additional use of unsupervised k-means clustering analysis for the evaluation of calcium synchronous events greatly enhances the quality of their evidence. The authors nicely propose that neuroendocrine dysfunction in PCOS might involve different setpoints through the hypothalamic-pituitary-gonadal (HPG) axis, and beyond kisspeptin neurons, which importantly pushes our field forward toward future investigations.

      Weaknesses:

      Although the authors provide important evidence, additional efforts are required to improve the quality of the manuscript and back up their claims. For instance, animal experiments failed to detect high testosterone levels in PNA female mice, a well-established PCOS mouse model. Considering that androgen excess is a hallmark of PCOS, this highly influences the subsequent evaluation of calcium synchronous events in arcuate kisspeptin neurons and the implications for neuroendocrine derangements.

      Please see our response to Reviewer 1. It will be important to establish a robust PCOS mouse model in the future that has elevated pulse generator activity in the presence of elevated testosterone concentrations.

      Authors also may need to provide LH data from another mouse model used in their work, the peripubertal androgen (PPA) model. Their claims seem to fall short without the pairing evidence of calcium synchronous events in arcuate kisspeptin neurons and LH pulse secretion.

      We have demonstrated that ARN-KISS neuron SEs are perfectly correlated with pulsatile LH secretion in intact and gonadectomized male and female mice on many occasions. Given that the pulse generator frequency slows by 50% in PPA mice, it is very hard to imagine how this could result in an elevated LH pulse frequency. While we were undertaking these studies the first paper (to our knowledge) looking at pulsatile LH secretion in the PPA model was published; no change was found.

      Another aspect that requires reviewing, is further exploration of their calcium synchronous events data and the increase of animal numbers in some of their experiments.

      Please see below.

      Reviewer #3 (Recommendations For The Authors):

      The reviewer believes that this work will greatly contribute to the field and, to provide better manuscript quality, there might be only a few minor and major revisions to be included in the future version.

      Minor:

      (1) Line 17: I would change the sentence to "One in ten women in their reproductive age suffer from PCOS" to adapt to more accurate prevalence studies.

      We have revised the sentence as recommended.

      (2) Line 18 and 19: Although the evidence indeed points to a high LH pulse secretion in PCOS, I would change it to "with increased LH secretion" as most studies show mean values and not LH pulse release data.

      While we agree that most human studies show a mean increase in LH, when assessed with sufficient temporal resolution, this results from elevated LH pulse frequency. As such, and to keep the manuscript focussed on the pulse generator, we would like the retain the present wording.

      (3) Line 47: Please correct "polycystic ovaries" to polycystic-like ovarian morphology to adapt to the current AEPCOS guidelines.

      We have revised the sentence as recommended.

      (4) Line 231: Authors stated that "These PNA mice exhibited a cyclical pattern of activity similar to that of control mice" (Figure 5C and D). Please, include the statistical tests here for this claim. Although they say there aren't differences, the colored fields do not reflect this and seem quite different. Could the authors re-evaluate these claims or provide better examples in the figure?

      We used Sidak’s multiple comparisons tests for this analysis (as stated in Results). The key data for assessing overall cyclical activity in PNA and control mice is Fig 5B which suggest very little difference. We accept that the individual traces of activity (Fig.5D) do not look identical to controls and, indeed, they are representative of the data set. The key point is they remain cyclical in an acyclic mouse. We have made sure that this is clear in the text.

      (5) Subheadings 6 and & of the result section: It sounds confusing to read the foremost claims of the absence of SE differences and next have a clear SE frequency difference in Figures 6 C and D. The reviewer suggests that authors could reorganize the text and figures to make their rationale flow better for future readers.

      We have considered this point carefully but find that re-organization creates its own problems with having to use the machine learning algorithm before describing it. It will always be problematic to incorporate this type of data-reanalysis in an original paper but think this present sequence is the best that can be achieved.

      (6) Discussion: If PNA female mice did not have elevated testosterone levels, how can the authors compare their results to the current literature? Could this be the case for lacking a more robust ARNKISS neuronal activity output in their experiments? The reviewer recommends a better discussion concerning these aspects.

      Please refer to our response to Reviewer #1 comment (2).

      (7) Discussion: the authors claim that diestrous PNA mice exhibited highly variable patterns of ARNKISS neuron activity. Would these differences be due to different circulating sex steroid levels or intrinsic properties? Would the inclusion of future in vitro calcium imaging (brain slices) studies contribute to their research question and conclusions? The reviewer recommends a better discussion concerning these aspects.

      We have tried to clarify that the highly variable patterns of activity in “diestrous” PNA mice come from the fact that we are actually randomly recording from ARN-KISS neurons at metestrus, diestrus, proestrus and estrus.  The pulse generator is cycling but we only have the acyclic “diestrous” smear to go by. This also makes brain slice studies difficult as we would never know the actual cycle stage.

      Major:

      (1) Results section: The reviewer strongly recommends that the LH pulse secretion data for the PPA group be included in the manuscript. If the SEs represent the central mechanism of pulse generation, would the LH pulse frequency match those events? If not, could a mismatch be explained by androgen-mediated negative feedback at the pituitary level? What is the pituitary LH response to exogenous GnRH (i.p. injection) in the PPA group?

      Our initial observation showed the frequency of ARNKISS neuron SEs was halved in PPA mice compared to controls. Additionally, one study reported pulsatile LH secretion to be unchanged in this animal model (Coyle, Prescott et al. 2022). Both pieces of evidence clearly indicate that the PPA mouse does not provide an appropriate PCOS model of elevated pulse generator activity. Therefore, we do not see the value of pursuing further experiments in this animal model.

      (2) Although the evaluation of relative frequency and normalized amplitude indicate the dynamic over time, the authors should include the average amplitudes and frequencies of events within the recording session. For instance, looking at Figures 1 A and B and Figures 3 A and B, a reader can observe differences in the amplitude due to different scaling axes. Perhaps, using a Python toolbox such as GuPPy or any preferred analysis pipeline might help authors include these parameters.

      The amplitude of recorded SEs for each mouse depends primarily on the fiber position. As such, it has only ever been possible to assess SE amplitude changes within the same mouse. It is not possible to assess differences in SE amplitude between mice.

      (3) Line 144-156: (Immunoreactivity results): Authors should proceed with caution when describing these results and clearly state that results show a software-based measurement of immunoreactive signal intensity. In addition, the small sample size of the PNA group (N = 4) compared to controls (N = 6-7) seems to mask possible differences. Could the authors increase the N of the PNA group and re-evaluate these results?

      We have clarified that the immunoreactive signal intensity is based on software-based measurement. The N number for PNA mice in these studies varies from 4 to 6 depending on brain section availability for the different immunohistochemistry runs. The scatter of data is such that any new data points would need to be at the extreme of the distributions to likely have any impact on statistical significance. As a minor part of the paper, we did not feel that the use of further mice was warranted.

      (4) Considering the great variability of PNA's number of SE/hr, the review suggests increasing the N in this group, thus, authors can re-evaluate their findings and draw better analysis/ conclusion.

      We have n=6 for the PNA group in the study. As noted above, the variability in SE/hr in Figure 3 comes from assessing the pulse generator at random times within the estrous cycle. Once we separate “diestrous-like” stage for the PNA animals, the variability is decreased as shown in Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of reviewers’ comments and our revisions: 

      We thank the reviewers for their thoughtful feedback. This feedback has motivated multiple revisions and additions that, in our view, have greatly improved the manuscript. This is especially true with regard to a major goal of this study: clearly defining existing scientific perspectives and delineating their decoding implications. In addition to building on this conceptual goal, we have expanded existing analyses and have added a new analysis of generalization using a newly collected dataset. We expect the manuscript will be of very broad interest, both to those interested in BCI development and to those interested in fundamental properties of neural population activity and its relationship with behavior.

      Importantly, all reviewers were convinced that MINT provided excellent performance, when benchmarked against existing methods, across a broad range of standard tasks:

      “their method shows impressive performance compared to more traditional decoding approaches” (R1) 

      “The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method.” (R2) 

      “The fact that performance on stereotyped tasks is high is interesting and informative…” (R3)

      This is important. It is challenging to design a decoder that performs consistently across multiple domains and across multiple situations (including both decoding and neural state estimation). MINT does so. MINT consistently outperformed existing lightweight ‘interpretable’ decoders, despite being a lightweight interpretable decoder itself. MINT was very competitive with expressive machine-learning methods, yet has advantages in flexibility and simplicity that more ‘brute force’ methods do not. We made a great many comparisons, and MINT was consistently a strong performer. Of the many comparisons we made, there was only one where MINT was at a modest disadvantage, and it was for a dataset where all methods performed poorly. No other method we tested was as consistent. For example, although the GRU and the feedforward network were often competitive with MINT (and better than MINT in the one case mentioned above), there were multiple other situations where they performed less well and a few situations where they performed poorly. Moreover, no other existing decoder naturally estimates the neural state while also readily decoding, without retraining, a broad range of behavioral variables.

      R1 and R2 were very positive about the broader impacts of the study. They stressed its impact both on decoder design, and on how our field thinks, scientifically, about the population response in motor areas: 

      “This paper presents an innovative decoding approach for brain-computer interfaces” (R1)

      “presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour” (R1)

      “the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field” (R1)

      “The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality” (R2)

      “This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design.” (R2)

      “this work is also broadly impactful for neuroscientific analysis... Thus, MINT will likely impact neuroscience research generally.” (R2)

      We agree with these assessments, and have made multiple revisions to further play into these strengths. As one example, the addition of Figure 1b (and 6b) makes this the first study, to our knowledge, to fully and concretely illustrate this emerging scientific perspective and its decoding implications. This is important, because multiple observations convince us that the field is likely to move away from the traditional perspective in Figure 1a, and towards that in Figure 1b. We also agree with the handful of weaknesses R1 and R2 noted. The manuscript has been revised accordingly. The major weakness noted by R1 was the need to be explicit regarding when we suspect MINT would (and wouldn’t) work well in other brain areas. In non-motor areas, the structure of the data may be poorly matched with MINT’s assumptions. We agree that this is likely to be true, and thus agree with the importance of clarifying this topic for the reader. The revision now does so. R1 also wished to know whether existing methods might benefit from including trial-averaged data during training, something we now explore and document (see detailed responses below). R2 noted two weaknesses: 1) The need to better support (with expanded analysis) the statement that neural and behavioral trajectories are non-isometric, and 2) The need to more rigorously define the ‘mesh’. We agree entirely with both suggestions, and the revision has been strengthened by following them (see detailed responses below).

      R3 also saw strengths to the work, stating that:

      “This paper is well-structured and its main idea is clear.” 

      “The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories.” 

      “The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength.”

      However, R3 also expressed two sizable concerns. The first is that MINT might have onerous memory requirements. The manuscript now clarifies that MINT has modest memory requirements. These do not scale unfavorably as the reviewer was concerned they might. The second concern is that MINT is: 

      “essentially a table-lookup rather than a model.”

      Although we don’t agree, the concern makes sense and may be shared by many readers, especially those who take a particular scientific perspective. Pondering this concern thus gave us the opportunity to modify the manuscript in ways that support its broader impact. Our revisions had two goals: 1) clarify the ways in which MINT is far more flexible than a lookup-table, and 2) better describe the dominant scientific perspectives and their decoding implications.

      The heart of R3’s concern is the opinion that MINT is an effective but unprincipled hack suitable for situations where movements are reasonably stereotyped. Of course, many tasks involve stereotyped movements (e.g. handwriting characters), so MINT would still be useful. Nevertheless, if MINT is not principled, other decode methods would often be preferable because they could (unlike MINT in R3’s opinion) gain flexibility by leveraging an accurate model. Most of R3’s comments flow from this fundamental concern: 

      “This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model.”

      “MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks.”

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not).”

      The manuscript has been revised to clarify that MINT is considerably more flexible than a lookup table, even though a lookup table is used as a first step. Yet, on its own, this does not fully address R3’s concern. The quotes above highlight that R3 is making a standard assumption in our field: that there exists a “movement space and associated neural space”. Under this perspective, one should, as R3 argues fully explore the movement space. This would perforce fully explore the associated neural subspace. One can then “model the neural subspace and its association to movement”. MINT does not use a model of this type, and thus (from R3’s perspective) does not appear to use a model at all. A major goal of our study is to question this traditional perspective. We have thus added a new figure to highlight the contrast between the traditional (Figure 1a) and new (Figure 1b) scientific perspectives, and to clarify their decoding implications.

      While we favor the new perspective (Figure 1b), we concede that R3 may not share our view. This is fine. Part of the reason we believe this study is timely, and will be broadly read, is that it raises a topic of emerging interest where there is definitely room for debate. If we are misguided – i.e. if Figure 1a is the correct perspective – then many of R3’s concerns would be on target: MINT could still be useful, but traditional methods that make the traditional assumptions in Figure 1a would often be preferable. However, if the emerging perspective in Figure 1b is more accurate, then MINT’s assumptions would be better aligned with the data than those of traditional methods, making it a more (not less) principled choice.

      Our study provides new evidence in support of Figure 1b, while also synthesizing existing evidence from other recent studies. In addition to Figure 2, the new analysis of generalization further supports Figure 1b. Also supporting Figure 1b is the analysis in which MINT’s decoding advantage, over a traditional decoder, disappears when simulated data approximate the traditional perspective in Figure 1a.

      That said, we agree that the present study cannot fully resolve whether Figure 1a or 1b is more accurate. Doing so will take multiple studies with different approaches (indeed we are currently preparing other manuscripts on this topic). Yet we still have an informed scientific opinion, derived from past, present and yet-to-be-published observations. Our opinion is that Figure 1b is the more accurate perspective. This possibility makes it reasonable to explore the potential virtues of a decoding method whose assumptions are well-aligned with that perspective. MINT is such a method. As expected under Figure 1b, MINT outperforms traditional interpretable decoders in every single case we studied. 

      As noted above, we have added a new generalization-focused analysis (Figure 6) based on a newly collected dataset. We did so because R3’s comments highlight a deep point: which scientific perspective one takes has strong implications regarding decoder generalization. These implications are now illustrated in the new Figure 6a and 6b. Under Figure 6a, it is possible, as R3 suggests, to explore “the whole movement space and associated neural space” during training. However, under Figure 6b, expectations are very different. Generalization will be ‘easy’ when new trajectories are near the training-set trajectories. In this case, MINT should generalize well as should other methods. In contrast, generalization will be ‘hard’ when new neural trajectories have novel shapes and occupy previously unseen regions / dimensions. In this case, all current methods, including MINT, are likely to fail. R3 points out that traditional decoders have sometimes generalized well to new tasks (e.g. from center-out to ‘pinball’) when cursor movements occur in the same physical workspace. These findings could be taken to support Figure 6a, but are equally consistent with ‘easy’ generalization in Figure 6b. To explore this topic, the new analysis in Figure 6c-g considers conditions that are intended to span the range from easy to hard. Results are consistent with the predictions of Figure 6b. 

      We believe the manuscript has been significantly improved by these additions. The revisions help the manuscript achieve its twin goals: 1) introduce a novel class of decoder that performs very well despite being very simple, and 2) describe properties of motor-cortex activity that will matter for decoders of all varieties.

      Reviewer #1: 

      Summary: 

      This paper presents an innovative decoding approach for brain-computer interfaces (BCIs), introducing a new method named MINT. The authors develop a trajectory-centric approach to decode behaviors across several different datasets, including eight empirical datasets from the Neural Latents Benchmark. Overall, the paper is well written and their method shows impressive performance compared to more traditional decoding approaches that use a simpler approach. While there are some concerns (see below), the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field. 

      We thank the reviewer for these comments. We share their enthusiasm for the trajectory-centric approach, and we are in complete agreement that this perspective has both scientific and decoding implications. The revision expands upon these strengths.

      Strengths: 

      The adoption of a trajectory-centric approach that utilizes statistical constraints presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour. This is one of the strongest aspects of the paper. 

      Again, thank you. We also expect the trajectory-centric perspective to have a broad impact, given its relevance to both decoding and to thinking about manifolds.

      The thorough evaluation of the method across various datasets serves as an assurance that the superior performance of MINT is not a result of overfitting. The comparative simplicity of the method in contrast to many neural network approaches is refreshing and should facilitate broader applicability. 

      Thank you. We were similarly pleased to see such a simple method perform so well. We also agree that, while neural-network approaches will always be important, it is desirable to also possess simple ‘interpretable’ alternatives.

      Weaknesses:  

      Comment 1) Scope: Despite the impressive performance of MINT across multiple datasets, it seems predominantly applicable to M1/S1 data. Only one of the eight empirical datasets comes from an area outside the motor/somatosensory cortex. It would be beneficial if the authors could expand further on how the method might perform with other brain regions that do not exhibit low tangling or do not have a clear trial structure (e.g. decoding of position or head direction from hippocampus) 

      We agree entirely. Population activity in many brain areas (especially outside the motor system) presumably will often not have the properties upon which MINT’s assumptions are built. This doesn’t necessarily mean that MINT would perform badly. Using simulated data, we have found that MINT can perform surprisingly well even when some of its assumptions are violated. Yet at the same time, when MINT’s assumptions don’t apply, one would likely prefer to use other methods. This is, after all, one of the broader themes of the present study: it is beneficial to match decoding assumptions to empirical properties. We have thus added a section on this topic early in the Discussion: 

      “In contrast, MINT and the Kalman filter performed comparably on simulated data that better approximated the assumptions in Figure 1a. Thus, MINT is not a ‘better’ algorithm – simply better aligned with the empirical properties of motor cortex data. This highlights an important caveat. Although MINT performs well when decoding from motor areas, its assumptions may be a poor match in other areas (e.g. the hippocampus). MINT performed well on two non-motor-cortex datasets – Area2_Bump (S1) and DMFC_RSG (dorsomedial frontal cortex) – yet there will presumably be other brain areas and/or contexts where one would prefer a different method that makes assumptions appropriate for that area.”

      Comment 2) When comparing methods, the neural trajectories of MINT are based on averaged trials, while the comparison methods are trained on single trials. An additional analysis might help in disentangling the effect of the trial averaging. For this, the authors could average the input across trials for all decoders, establishing a baseline for averaged trials. Note that inference should still be done on single trials. Performance can then be visualized across different values of N, which denotes the number of averaged trials used for training. 

      We explored this question and found that the non-MINT decoders are harmed, not helped, by the inclusion of trial-averaged responses in the training set. This is presumably because the statistics of trialaveraged responses don’t resemble what will be observed during decoding. This statistical mismatch, between training and decoding, hurts most methods. It doesn’t hurt MINT, because MINT doesn’t ‘train’ in the normal way. It simply needs to know rates, and trial-averaging is a natural way to obtain them. To describe the new analysis, we have added the following to the text.

      “We also investigated the possibility that MINT gained its performance advantage simply by having access to trial-averaged neural trajectories during training, while all other methods were trained on single-trial data. This difference arises from the fundamental requirements of the decoder architectures: MINT needs to estimate typical trajectories while other methods don’t. Yet it might still be the case that other methods would benefit from including trial-averaged data in the training set, in addition to single-trial data. Alternatively, this might harm performance by creating a mismatch, between training and decoding, in the statistics of decoder inputs. We found that the latter was indeed the case: all non-MINT methods performed better when trained purely on single-trial data.”

      Reviewer #2:

      Summary: 

      The goal of this paper is to present a new method, termed MINT, for decoding behavioral states from neural spiking data. MINT is a statistical method which, in addition to outputting a decoded behavioral state, also provides soft information regarding the likelihood of that behavioral state based on the neural data. The innovation in this approach is neural states are assumed to come from sparsely distributed neural trajectories with low tangling, meaning that neural trajectories (time sequences of neural states) are sparse in the high-dimensional space of neural spiking activity and that two dissimilar neural trajectories tend to correspond to dissimilar behavioral trajectories. The authors support these assumptions through analysis of previously collected data, and then validate the performance of their method by comparing it to a suite of alternative approaches. The authors attribute the typically improved decoding performance by MINT to its assumptions being more faithfully aligned to the properties of neural spiking data relative to assumptions made by the alternatives. 

      We thank the reviewer for this accurate summary, and for highlighting the subtle but important fact that MINT provides information regarding likelihoods. The revision includes a new analysis (Figure 6e) illustrating one potential way to leverage knowledge of likelihoods.

      Strengths:  

      The paper did an excellent job critically evaluating common assumptions made by neural analytical methods, such as neural state being low-dimensional relative to the number of recorded neurons. The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality. 

      Thank you. We also hope that the shift in perspective is the most important contribution of the study. This shift matters both scientifically and for decoder design. The revision expands on this strength. The scientific alternatives are now more clearly and concretely illustrated (especially see Figure 1a,b and Figure 6a,b). We also further explore their decoding implications with new data (Figure 6c-g).

      The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method. The authors also provided nice intuition regarding why MINT may offer performance improvement in some cases and in which instances MINT may not perform as well. 

      Thank you. We were pleased to be able to provide comparisons across so many datasets (we are grateful to the Neural Latents Benchmark for making this possible).

      In addition to providing a philosophical discussion as to the advantages of MINT and benchmarking against alternatives, the authors also provided a detailed description of practical considerations. This included training time, amount of training data, robustness to data loss or changes in the data, and interpretability. These considerations not only provided objective evaluation of practical aspects but also provided insights to the flexibility and robustness of the method as they relate back to the underlying assumptions and construction of the approach. 

      Thank you. We are glad that these sections were appreciated. MINT’s simplicity and interpretability are indeed helpful in multiple ways, and afford opportunities for interesting future extensions. One potential benefit of interpretability is now explored in the newly added Figure 6e. 

      Impact: 

      This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design. However, this work is also broadly impactful for neuroscientific analysis to relate neural spiking activity to observable behavioral features. Thus, MINT will likely impact neuroscience research generally. The methods are made publicly available, and the datasets used are all in public repositories, which facilitates adoption and validation of this method within the greater scientific community. 

      Again, thank you. We have similar hopes for this study.

      Weaknesses (1 & 2 are related, and we have switched their order in addressing them): 

      Comment 2) With regards to the idea of neural and behavioral trajectories having different geometries, this is dependent on what behavioral variables are selected. In the example for Fig 2a, the behavior is reach position. The geometry of the behavioral trajectory of interest would look different if instead the behavior of interest was reach velocity. The paper would be strengthened by acknowledgement that geometries of trajectories are shaped by extrinsic choices rather than (or as much as they are) intrinsic properties of the data. 

      We agree. Indeed, we almost added a section to the original manuscript on this exact topic. We have now done so:

      “A potential concern regarding the analyses in Figure 2c,d is that they require explicit choices of behavioral variables: muscle population activity in Figure 2c and angular phase and velocity in Figure 2d. Perhaps these choices were misguided. Might neural and behavioral geometries become similar if one chooses ‘the right’ set of behavioral variables? This concern relates to the venerable search for movement parameters that are reliably encoded by motor cortex activity [69, 92–95]. If one chooses the wrong set of parameters (e.g. chooses muscle activity when one should have chosen joint angles) then of course neural and behavioral geometries will appear non-isometric. There are two reasons why this ‘wrong parameter choice’ explanation is unlikely to account for the results in Figure 2c,d. First, consider the implications of the left-hand side of Figure 2d. A small kinematic distance implies that angular position and velocity are nearly identical for the two moments being compared. Yet the corresponding pair of neural states can be quite distant. Under the concern above, this distance would be due to other encoded behavioral variables – perhaps joint angle and joint velocity – differing between those two moments. However, there are not enough degrees of freedom in this task to make this plausible. The shoulder remains at a fixed position (because the head is fixed) and the wrist has limited mobility due to the pedal design [60]. Thus, shoulder and elbow angles are almost completely determined by cycle phase. More generally, ‘external variables’ (positions, angles, and their derivatives) are unlikely to differ more than slightly when phase and angular velocity are matched. Muscle activity could be different because many muscles act on each joint, creating redundancy. However, as illustrated in Figure 2c, the key effect is just as clear when analyzing muscle activity. Thus, the above concern seems unlikely even if it can’t be ruled out entirely. A broader reason to doubt the ‘wrong parameter choice’ proposition is that it provides a vague explanation for a phenomenon that already has a straightforward explanation. A lack of isometry between the neural population response and behavior is expected when neural-trajectory tangling is low and output-null factors are plentiful [55, 60]. For example, in networks that generate muscle activity, neural and muscle-activity trajectories are far from isometric [52, 58, 60]. Given this straightforward explanation, and given repeated failures over decades to find the ‘correct’ parameters (muscle activity, movement direction, etc.) that create neural-behavior isometry, it seems reasonable to conclude that no such isometry exists.”

      Comment 1) The authors posit that neural and behavioral trajectories are non-isometric. To support this point, they look at distances between neural states and distances between the corresponding behavioral states, in order to demonstrate that there are differences in these distances in each respective space. This supports the idea that neural states and behavioral states are non-isometric but does not directly address their point. In order to say the trajectories are non-isometric, it would be better to look at pairs of distances between corresponding trajectories in each space. 

      We like this idea and have added such an analysis. To be clear, we like the original analysis too: isometry predicts that neural and behavioral distances (for corresponding pairs of points) should be strongly correlated, and that small behavioral distances should not be associated with large neural distances. These predictions are not true, providing a strong argument against isometry. However, we also like the reviewer’s suggestion, and have added such an analysis. It makes the same larger point, and also reveals some additional facts (e.g. it reveals that muscle-geometry is more related to neural-geometry than is kinematic-geometry). The new analysis is described in the following section:

      “We further explored the topic of isometry by considering pairs of distances. To do so, we chose two random neural states and computed their distance, yielding dneural1. We repeated this process, yielding dneural2. We then computed the corresponding pair of distances in muscle space (dmuscle1 and dmuscle2) and kinematic space (dkin1 and dkin2). We considered cases where dneural1 was meaningfully larger than (or smaller than) dneural2, and asked whether the behavioral variables had the same relationship; e.g. was dmuscle1 also larger than dmuscle2? For kinematics, this relationship was weak: across 100,000 comparisons, the sign of dkin1 − dkin2 agreed with dneural1 − dneural2 only 67.3% of the time (with 50% being chance). The relationship was much stronger for muscles: the sign of dmuscle1 − dmuscle2 agreed with dneural1 − dneural2 79.2% of the time, which is far more than expected by chance yet also far from what is expected given isometry (e.g. the sign agrees 99.7% of the time for the truly isometric control data in Figure 2e). Indeed there were multiple moments during this task when dneural1 was much larger than dneural2, yet dmuscle1 was smaller than dmuscle2. These observations are consistent with the proposal that neural trajectories resemble muscle trajectories in some dimensions, but with additional output-null dimensions that break the isometry [60].”

      Comment 3) The approach is built up on the idea of creating a "mesh" structure of possible states. In the body of the paper the definition of the mesh was not entirely clear and I could not find in the methods a more rigorous explicit definition. Since the mesh is integral to the approach, the paper would be improved with more description of this component. 

      This is a fair criticism. Although MINTs actual operations were well-documented, how those operations mapped onto the term ‘mesh’ was, we agree, a bit vague. The definition of the mesh is a bit subtle because it only emerges during decoding rather than being precomputed. This is part of what gives MINT much more flexibility than a lookup table. We have added the following to the manuscript.

      “We use the term ‘mesh’ to describe the scaffolding created by the training-set trajectories and the interpolated states that arise at runtime. The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations.

      Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local”

      We have also added Figure 4d. This new analysis documents the fact that decoded states are near trainingset trajectories, which is why the term ‘mesh’ is appropriate.

      Reviewer #3:

      Summary:  

      This manuscript develops a new method termed MINT for decoding of behavior. The method is essentially a table-lookup rather than a model. Within a given stereotyped task, MINT tabulates averaged firing rate trajectories of neurons (neural states) and corresponding averaged behavioral trajectories as stereotypes to construct a library. For a test trial with a realized neural trajectory, it then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior. The method can also interpolate between these tabulated trajectories. The authors mention that the method is based on three key assumptions: (1) Neural states may not be embedded in a lowdimensional subspace, but rather in a high-dimensional space. (2) Neural trajectories are sparsely distributed under different behavioral conditions. (3) These neural states traverse trajectories in a stereotyped order.  

      The authors conducted multiple analyses to validate MINT, demonstrating its decoding of behavioral trajectories in simulations and datasets (Figures 3, 4). The main behavior decoding comparison is shown in Figure 4. In stereotyped tasks, decoding performance is comparable (M_Cycle, MC_Maze) or better (Area 2_Bump) than other linear/nonlinear algorithms

      (Figure 4). However, MINT underperforms for the MC_RTT task, which is less stereotyped (Figure 4).  

      This paper is well-structured and its main idea is clear. The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories. The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength. However, I have several major concerns. I believe several of the conclusions in the paper, which are also emphasized in the abstract, are not accurate or supported, especially about generalization, computational scalability, and utility for BCIs. MINT is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling. These aspects will limit MINT's utility for real-world BCIs and tasks. These properties will also limit MINT's generalizability from task to task, which is important for BCIs and thus is commonly demonstrated in BCI experiments with other decoders without any retraining. Furthermore, MINT's computational and memory requirements can be prohibitive it seems. Finally, as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations. I expand on these concerns below.  

      We thank the reviewer for pointing out weaknesses in our framing and presentation. The comments above made us realize that we needed to 1) better document the ways in which MINT is far more flexible than a lookup-table, and 2) better explain the competing scientific perspectives at play. R3’s comments also motivated us to add an additional analysis of generalization. In our view the manuscript is greatly improved by these additions. Specifically, these additions directly support the broader impact that we hope the study will have.

      For simplicity and readability, we first group and summarize R3’s main concerns in order to better address them. (These main concerns are all raised above, in addition to recurring in the specific comments below. Responses to each individual specific comment are provided after these summaries.)

      (1) R3 raises concerns about ‘computational scalability.’ The concern is that “MINT's computational and memory requirements can be prohibitive.” This point was expanded upon in a specific comment, reproduced below:

      I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract.

      The revised manuscript clarifies that our statement (that computations are simple and scalable) is absolutely accurate. There is no need to compute, or store, a massive lookup table. There are three tables: two of modest size and one that is tiny. This is now better explained:

      “Thus, the log-likelihood of , for a particular current neural state, is simply the sum of many individual log-likelihoods (one per neuron and time-bin). Each individual log-likelihood depends on only two numbers: the firing rate at that moment and the spike count in that bin. To simplify online computation, one can precompute the log-likelihood, under a Poisson model, for every plausible combination of rate and spike-count. For example, a lookup table of size 2001 × 21 is sufficient when considering rates that span 0-200 spikes/s in increments of 0.1 spikes/s, and considering 20 ms bins that contain at most 20 spikes (only one lookup table is ever needed, so long as its firing-rate range exceeds that of the most-active neuron at the most active moment in Ω). Now suppose we are observing a population of 200 neurons, with a 200 ms history divided into ten 20 ms bins. For each library state, the log-likelihood of the observed spike-counts is simply the sum of 200 × 10 = 2000 individual loglikelihoods, each retrieved from the lookup table. In practice, computation is even simpler because many terms can be reused from the last time bin using a recursive solution (Methods). This procedure is lightweight and amenable to real-time applications.”

      In summary, the first table simply needs to contain the firing rate of each neuron, for each condition, and each time in that condition. This table consumes relatively little memory. Assuming 100 one-second-long conditions (rates sampled every 20 ms) and 200 neurons, the table would contain 100 x 50 x 200 = 1,000,000 numbers. These numbers are typically stored as 16-bit integers (because rates are quantized), which amounts to about 2 MB. This is modest, given that most computers have (at least) tens of GB of RAM. A second table would contain the values for each behavioral variable, for each condition, and each time in that condition. This table might contain behavioral variables at a finer resolution (e.g. every millisecond) to enable decoding to update in between 20 ms bins (1 ms granularity is not needed for most BCI applications, but is the resolution used in this study). The number of behavioral variables of interest for a particular BCI application is likely to be small, often 1-2, but let’s assume for this example it is 10 (e.g. x-, y-, and z-position, velocity, and acceleration of a limb, plus one other variable). This table would thus contain 100 x 1000 x 10 = 1,000,000 floating point numbers, i.e. an 8 MB table. The third table is used to store the probability of s spikes being observed given a particular quantized firing rate (e.g. it may contain probabilities associated with firing rates ranging from 0 – 200 spikes/s in 0.1 spikes/s increments). This table is not necessary, but saves some computation time by precomputing numbers that will be used repeatedly. This is a very small table (typically ~2000 x 20, i.e. 320 KB). It does not need to be repeated for different neurons or conditions, because Poisson probabilities depend on only rate and count.

      (2) R3 raises a concern that MINT “is essentially a table-lookup rather than a model.’ R3 states that MINT 

      “is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling.”

      and that,

      “as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations.”

      This concern is central to most subsequent concerns. The manuscript has been heavily revised to address it. The revisions clarify that MINT is much more flexible than a lookup table, even though MINT uses a lookup table as its first step. Because R3’s concern is intertwined with one’s scientific assumptions, we have also added the new Figure 1 to explicitly illustrate the two key scientific perspectives and their decoding implications. 

      Under the perspective in Figure 1a, R3 would be correct in saying that there exist traditional interpretable decoders (e.g. a Kalman filter) whose assumptions better model the data. Under this perspective, MINT might still be an excellent choice in many cases, but other methods would be expected to gain the advantage when situations demand more flexibility. This is R3’s central concern, and essentially all other concerns flow from it. It makes sense that R3 has this concern, because their comments repeatedly stress a foundational assumption of the perspective in Figure 1a: the assumption of a fixed lowdimensional neural subspace where activity has a reliable relationship to behavior that can be modeled and leveraged during decoding. The phrases below accord with that view:

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “it will not generalize… even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space”

      “I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics”

      Thus, R3 prefers a model that 1) assumes a low-dimensional subspace that is fixed across tasks and 2) assumes a consistent ‘association’ between neural activity and kinematics. Because R3 believes this is the correct model of the data, they believe that decoders should leverage it. Traditional interpretable method do, and MINT doesn’t, which is why they find MINT to be unprincipled. This is a reasonable view, but it is not our view. We have heavily revised the manuscript to clarify that a major goal of our study is to explore the implications of a different, less-traditional scientific perspective.

      The new Figure 1a illustrates the traditional perspective. Under this perspective, one would agree with R3’s claim that other methods have the opportunity to model the data better. For example, suppose there exists a consistent neural subspace – conserved across tasks – where three neural dimensions encode 3D hand position and three additional neural dimensions encode 3D hand velocity. A traditional method such as a Kalman filter would be a very appropriate choice to model these aspects of the data.

      Figure 1b illustrates the alternative scientific perspective. This perspective arises from recent, present, and to-be-published observations. MINT’s assumptions are well-aligned with this perspective. In contrast, the assumptions of traditional methods (e.g. the Kalman filter) are not well-aligned with the properties of the data under this perspective. This does not mean traditional methods are not useful. Yet under Figure 1b, it is traditional methods, such as the Kalman filter, that lack an accurate model of the data. Of course, the reviewer may disagree with our scientific perspective. We would certainly concede that there is room for debate. However, we find the evidence for Figure 1b to be sufficiently strong that it is worth exploring the utility of methods that align with this scientific perspective. MINT is such a method. As we document, it performs very well.

      Thus, in our view, MINT is quite principled because its assumptions are well aligned with the data. It is true that the features of the data that MINT models are a bit different from those that are traditionally modeled. For example, R3 is quite correct that MINT does not attempt to use a biomimetic model of the true transformation from neural activity, to muscle activity, and thence to kinematics. We see this as a strength, and the manuscript has been revised accordingly (see paragraph beginning with “We leveraged this simulated data to compare MINT with a biomimetic decoder”).

      (3) R3 raises concerns that MINT cannot generalize. This was a major concern of R3 and is intimately related to concern #2 above. The concern is that, if MINT is “essentially a lookup table” that simply selects pre-defined trajectories, then MINT will not be able to generalize. R3 is quite correct that MINT generalizes rather differently than existing methods. Whether this is good or bad depends on one’s scientific perspective. Under Figure 1a, MINT’s generalization would indeed be limiting because other methods could achieve greater flexibility. Under Figure 1b, all methods will have serious limits regarding generalization. Thus, MINT’s method for generalizing may approximate the best one can presently do. To address this concern, we have made three major changes, numbered i-iii below:

      i) Large sections of the manuscript have been restructured to underscore the ways in which MINT can generalize. A major goal was to counter the impression, stated by R3 above, that: 

      “for a test trial with a realized neural trajectory, [MINT] then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior”.

      This description is a reasonable way to initially understand how MINT works, and we concede that we may have over-used this intuition. Unfortunately, it can leave the misimpression that MINT decodes by selecting whole trajectories, each corresponding to ‘a behavior’. This can happen, but it needn’t and typically doesn’t. As an example, consider the cycling task. Suppose that the library consists of stereotyped trajectories, each four cycles long, at five fixed speeds from 0.5-2.5 Hz. If the spiking observations argued for it, MINT could decode something close to one of these five stereotyped trajectories. Yet it needn’t. Decoded trajectories will typically resemble library trajectories locally, but may be very different globally. For example, a decoded trajectory could be thirty cycles long (or two, or five hundred) perhaps speeding up and slowing down multiple times across those cycles.

      Thus, the library of trajectories shouldn’t be thought of as specifying a limited set of whole movements that can be ‘selected from’. Rather, trajectories define a scaffolding that outlines where the neural state is likely to live and how it is likely to be changing over time. When we introduce the idea of library trajectories, we are now careful to stress that they don’t function as a set from which one trajectory is ‘declared’ to be the right one:

      “We thus designed MINT to approximate that manifold using the trajectories themselves, rather than their covariance matrix or corresponding subspace. Unlike a covariance matrix, neural trajectories indicate not only which states are likely, but also which state-derivatives are likely. If a neural state is near previously observed states, it should be moving in a similar direction. MINT leverages this directionality.

      Training-set trajectories can take various forms, depending on what is convenient to collect. Most simply, training data might include one trajectory per condition, with each condition corresponding to a discrete movement. Alternatively, one might instead employ one long trajectory spanning many movements. Another option is to employ many sub-trajectories, each briefer than a whole movement. The goal is simply for training-set trajectories to act as a scaffolding, outlining the manifold that might be occupied during decoding and the directions in which decoded trajectories are likely to be traveling.”

      Later in that same section we stress that decoded trajectories can move along the ‘mesh’ in nonstereotyped ways:

      “Although the mesh is formed of stereotyped trajectories, decoded trajectories can move along the mesh in non-stereotyped ways as long as they generally obey the flow-field implied by the training data. This flexibility supports many types of generalization, including generalization that is compositional in nature. Other types of generalization – e.g. from the green trajectories to the orange trajectories in Figure 1b – are unavailable when using MINT and are expected to be challenging for any method (as will be documented in a later section).”

      The section “Training and decoding using MINT” has been revised to clarify the ways in which interpolation is flexible, allowing decoded movements to be globally very different from any library trajectory.

      “To decode stereotyped trajectories, one could simply obtain the maximum-likelihood neural state from the library, then render a behavioral decode based on the behavioral state with the same values of c and k. This would be appropriate for applications in which conditions are categorical, such as typing or handwriting. Yet in most cases we wish for the trajectory library to serve not as an exhaustive set of possible states, but as a scaffolding for the mesh of possible states. MINT’s operations are thus designed to estimate any neural trajectory – and any corresponding behavioral trajectory – that moves along the mesh in a manner generally consistent with the trajectories in Ω.”

      “…interpolation allows considerable flexibility. Not only is one not ‘stuck’ on a trajectory from Φ, one is also not stuck on trajectories created by weighted averaging of trajectories in Φ. For example, if cycling speed increases, the decoded neural state could move steadily up a scaffolding like that illustrated in Figure 1b (green). In such cases, the decoded trajectory might be very different in duration from any of the library trajectories. Thus, one should not think of the library as a set of possible trajectories that are selected from, but rather as providing a mesh-like scaffolding that defines where future neural states are likely to live and the likely direction of their local motion. The decoded trajectory may differ considerably from any trajectory within Ω.”

      This flexibility is indeed used during movement. One empirical example is described in detail:

      “During movement… angular phase was decoded with effectively no net drift over time. This is noteworthy because angular velocity on test trials never perfectly matched any of the trajectories in Φ. Thus, if decoding were restricted to a library trajectory, one would expect growing phase discrepancies. Yet decoded trajectories only need to locally (and approximately) follow the flow-field defined by the library trajectories. Based on incoming spiking observations, decoded trajectories speed up or slow down (within limits).

      This decoding flexibility presumably relates to the fact that the decoded neural state is allowed to differ from the nearest state in Ω. To explore… [the text goes on to describe the new analysis in Figure 4d, which shows that the decoded state is typically not on any trajectory, though it is typically close to a trajectory].”

      Thus, MINT’s operations allow considerable flexibility, including generalization that is compositional in nature. Yet R3 is still correct that there are other forms of generalization that are unavailable to MINT. This is now stressed at multiple points in the revision. However, under the perspective in Figure 1b, these forms of generalization are unavailable to any current method. Hence we made a second major change in response to this concern…  ii) We explicitly illustrate how the structure of the data determines when generalization is or isn’t possible. The new Figure 1a,b introduces the two perspectives, and the new Figure 6a,b lays out their implications for generalization. Under the perspective in Figure 6a, the reviewer is quite right: other methods can generalize in ways that MINT cannot. Under the perspective in Figure 6b, expectations are very different. Those expectations make testable predictions. Hence the third major change… iii) We have added an analysis of generalization, using a newly collected dataset. This dataset was collected using Neuropixels Probes during our Pac-Man force-tracking task. This dataset was chosen because it is unusually well-suited to distinguishing the predictions in Figure 6a versus Figure 6b. Finding a dataset that can do so is not simple. Consider R3’s point that training data should “explore the whole movement space and the associated neural space”. The physical simplicity of the Pac-Man task makes it unusually easy to confirm that the behavioral workspace has been fully explored. Importantly, under Figure 6b, this does not mean that the neural workspace has been fully explored, which is exactly what we wish to test when testing generalization. We do so, and compare MINT with a Wiener filter. A Wiener filter is an ideal comparison because it is simple, performs very well on this task, and should be able to generalize well under Figure 1a. Additionally, the Wiener filter (unlike the Kalman Filter) doesn’t leverage the assumption that neural activity reflects the derivative of force. This matters because we find that neural activity does not reflect dforce/dt in this task. The Wiener filter is thus the most natural choice of the interpretable methods whose assumptions match Figure 1a.

      The new analysis is described in Figure 6c-g and accompanying text. Results are consistent with the predictions of Figure 6b. We are pleased to have been motivated to add this analysis for two reasons. First, it provides an additional way of evaluating the predictions of the two competing scientific perspectives that are at the heart of our study. Second, this analysis illustrates an underappreciated way in which generalization is likely to be challenging for any decode method. It can be tempting to think that the main challenge regarding generalization is to fully explore the relevant behavioral space. This makes sense if a behavioral space has “an associated neural space”. However, we are increasingly of the opinion that it doesn’t. Different tasks often involve different neural subspaces, even when behavioral subspaces overlap. We have even seen situations where motor output is identical but neural subspaces are quite different. These facts are relevant to any decoder, something highlighted in the revised Introduction:

      “MINT’s performance confirms that there are gains to be made by building decoders whose assumptions match a different, possibly more accurate view of population activity. At the same time, our results suggest fundamental limits on decoder generalization. Under the assumptions in Figure 1b, it will sometimes be difficult or impossible for decoders to generalize to not-yet-seen tasks. We found that this was true regardless of whether one uses MINT or a more traditional method. This finding has implications regarding when and how generalization should be attempted.”

      We have also added an analysis (Figure 6e) illustrating how MINT’s ability to compute likelihoods can be useful in detecting situations that may strain generalization (for any method). MINT is unusual in being able to compute and use likelihoods in this way.

      Detailed responses to R3: we reproduce each of R3’s specific concerns below, but concentrate our responses on issues not already covered above.

      Main comments: 

      Comment 1. MINT does not generalize to different tasks, which is a main limitation for BCI utility compared with prior BCI decoders that have shown this generalizability as I review below. Specifically, given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space). 

      First, the authors provide a section on generalization, which is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task. The former is critical for any algorithm, but it does not imply the latter. For example, removing one direction of cycling from the training set as the authors do here is an example of generating poor training data because the two behavioral (and neural) directions are non-overlapping and/or orthogonal while being in the same space. As such, it is fully expected that all methods will fail. For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not). Many BCI studies have indeed shown this generalization ability using a model. For example, in Weiss et al. 2019, center-out reaching tasks are used for training and then the same trained decoder is used for typing on a keyboard or drawing on the 2D screen. In Gilja et al. 2012, training is on a center-out task but the same trained decoder generalizes to a completely different pinball task (hit four consecutive targets) and tasks requiring the avoidance of obstacles and curved movements. There are many more BCI studies, such as Jarosiewicz et al. 2015 that also show generalization to complex realworld tasks not included in the training set. Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement. On the contrary, MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks. So, unlike these prior BCIs methods, MINT will likely actually need to include every task in its library, which is not practical. 

      I suggest the authors remove claims of generalization and modify their arguments throughout the text and abstract. The generalization section needs to be substantially edited to clarify the above points. Please also provide the BCI citations and discuss the above limitation of MINT for BCIs. 

      As discussed above, R3’s concerns are accurate under the view in Figure 1a (and the corresponding Figure 6a). Under this view, a method such as that in Gilja et al. or Jarosiewicz et al. can find the correct subspace, model the correct neuron-behavior correlations, and generalize to any task that uses “the same 2D computer screen and associated neural space”, just as the reviewer argues. Under Figure 1b things are quite different.

      This topic – and the changes we have made to address it – is covered at length above. Here we simply want to highlight an empirical finding: sometimes two tasks use the same neural subspace and sometimes they don’t. We have seen both in recent data, and it is can be very non-obvious which will occur based just on behavior. It does not simply relate to whether one is using the same physical workspace. We have even seen situations where the patterns of muscle activity in two tasks are nearly identical, but the neural subspaces are fairly different. When a new task uses a new subspace, neither of the methods noted above (Gilja nor Jarosiewicz) will generalize (nor will MINT). Generalizing to a new subspace is basically impossible without some yet-to-be-invented approach. On the other hand, there are many other pairs of tasks (center-out-reaching versus some other 2D cursor control) where subspaces are likely to be similar, especially if the frequency content of the behavior is similar (in our recent experience this is often critical). When subspaces are shared, most methods will generalize, and that is presumably why generalization worked well in the studies noted above.

      Although MINT can also generalize in such circumstances, R3 is correct that, under the perspective in Figure 1a, MINT will be more limited than other methods. This is now carefully illustrated in Figure 6a. In this traditional perspective, MINT will fail to generalize in cases where new trajectories are near previously observed states, yet move in very different ways from library trajectories. The reason we don’t view this is a shortcoming is that we expect it to occur rarely (else tangling would be high). We thus anticipate the scenario in Figure 6b.

      This is worth stressing because R3 states that our discussion of generalization “is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task.” We have heavily revised this section and improved it. However, it was never inaccurate. Under Figure 6b, these two concepts absolutely are mixed up. If different tasks use different neural subspaces, then this requires collecting different “informative training data” for each. One cannot simply count on having explored the physical workspace.

      Comment 2. MINT is shown to achieve competitive/high performance in highly stereotyped datasets with structured trials, but worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped. This shows that MINT is valuable for decoding in repetitive stereotyped use-cases. However, it also highlights a limitation of MINT for BCIs, which is that MINT may not work well for real-world and/or less-constrained setups such as typing, moving a robotic arm in 3D space, etc. This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model. Indeed, the authors acknowledge that the lower performance on MC_RTT (Figure 4) may be caused by the lack of repeated trials of the same type. However, real-world BCI decoding scenarios will also not have such stereotyped trial structure and will be less/un-constrained, in which MINT underperforms. Thus, the claim in the abstract or lines 480-481 that MINT is an "excellent" candidate for clinical BCI applications is not accurate and needs to be qualified. The authors should revise their statements according and discuss this issue. They should also make the use-case of MINT on BCI decoding clearer and more convincing. 

      We discussed, above, multiple changes and additions to the revision that were made to address these concerns. Here we briefly expand on the comment that MINT achieves “worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped”. All decoders performed poorly on this task. MINT still outperformed the two traditional methods, but this was the only dataset where MINT did not also perform better (overall) than the expressive GRU and feedforward network. There are probably multiple reasons why. We agree with R3 that one likely reason is that this dataset is straining generalization, and MINT may have felt this strain more than the two machine-learning-based methods. Another potential reason is the structure of the training data, which made it more challenging to obtain library trajectories in the first place. Importantly, these observations do not support the view in Figure 1a. MINT still outperformed the Kalman and Wiener filters (whose assumptions align with Fig. 1a). To make these points we have added the following:

      “Decoding was acceptable, but noticeably worse, for the MC_RTT dataset… As will be discussed below, every decode method achieved its worst estimates of velocity for the MC_RTT dataset. In addition to the impact of slower reaches, MINT was likely impacted by training data that made it challenging to accurate estimate library trajectories. Due to the lack of repeated trials, MINT used AutoLFADS to estimate the neural state during training. In principle this should work well. In practice AutoLFADS may have been limited by having only 10 minutes of training data. Because the random-target task involved more variable reaches, it may also have stressed the ability of all methods to generalize, perhaps for the reasons illustrated in Figure 1b.

      The only dataset where MINT did not perform the best overall was the MC_RTT dataset, where it was outperformed by the feedforward network and GRU. As noted above, this may relate to the need for MINT to learn neural trajectories from training data that lacked repeated trials of the same movement (a design choice one might wish to avoid). Alternatively, the less-structured MC_RTT dataset may strain the capacity to generalize; all methods experienced a drop in velocity-decoding R2 for this dataset compared to the others. MINT generalizes somewhat differently than other methods, and may have been at a modest disadvantage for this dataset. A strong version of this possibility is that perhaps the perspective in Figure 1a is correct, in which case MINT might struggle because it cannot use forms of generalization that are available to other methods (e.g. generalization based on neuron-velocity correlations). This strong version seems unlikely; MINT continued to significantly outperform the Wiener and Kalman filters, which make assumptions aligned with Figure 1a.”

      Comment 3. Related to 2, it may also be that MINT achieves competitive performance in offline and trial-based stereotyped decoding by overfitting to the trial structure in a given task, and thus may not generalize well to online performance due to overfitting. For example, a recent work showed that offline decoding performance may be overfitted to the task structure and may not represent online performance (Deo et al. 2023). Please discuss. 

      We agree that a limitation of our study is that we do not test online performance. There are sensible reasons for this decision:

      “By necessity and desire, all comparisons were made offline, enabling benchmarked performance across a variety of tasks and decoded variables, where each decoder had access to the exact same data and recording conditions.”

      We recently reported excellent online performance in the cycling task with a different algorithm

      (Schroeder et al. 2022). In the course of that study, we consistently found that improvements in our offline decoding translated to improvements in our online decoding. We thus believe that MINT (which improves on the offline performance of our older algorithm) is a good candidate to work very well online. Yet we agree this still remains to be seen. We have added the following to the Discussion:

      “With that goal in mind, there exist three important practical considerations. First, some decode algorithms experience a performance drop when used online. One presumed reason is that, when decoding is imperfect, the participant alters their strategy which in turn alters the neural responses upon which decoding is based. Because MINT produces particularly accurate decoding, this effect may be minimized, but this cannot be known in advance. If a performance drop does indeed occur, one could adapt the known solution of retraining using data collected during online decoding [13]. Another presumed reason (for a gap between offline and online decoding) is that offline decoders can overfit the temporal structure in training data [107]. This concern is somewhat mitigated by MINT’s use of a short spike-count history, but MINT may nevertheless benefit from data augmentation strategies such as including timedilated versions of learned trajectories in the libraries”

      Comment 4. Related to 2, since MINT requires firing rates to generate the library and simple averaging does not work for this purpose in the MC_RTT dataset (that does not have repeated trials), the authors needed to use AutoLFADS to infer the underlying firing rates. The fact that MINT requires the usage of another model to be constructed first and that this model can be computationally complex, will also be a limiting factor and should be clarified. 

      This concern relates to the computational complexity of computing firing-rate trajectories during training. Usually, rates are estimated via trial-averaging, which makes MINT very fast to train. This was quite noticeable during the Neural Latents Benchmark competition. As one example, for the “MC_Scaling 5 ms Phase”, MINT took 28 seconds to train while GPFA took 30 minutes, the transformer baseline (NDT) took 3.5 hours, and the switching nonlinear dynamical system took 4.5 hours.

      However, the reviewer is quite correct that MINT’s efficiency depends on the method used to construct the library of trajectories. As we note, “MINT is a method for leveraging a trajectory library, not a method for constructing it”. One can use trial-averaging, which is very fast. One can also use fancier, slower methods to compute the trajectories. We don’t view this as a negative – it simply provides options. Usually one would choose trial-averaging, but one does not have to. In the case of MC_RTT, one has a choice between LFADS and grouping into pseudo-conditions and averaging (which is fast). LFADS produces higher performance at the cost of being slower. The operator can choose which they prefer. This is discussed in the following section:

      “For MINT, ‘training’ simply means computation of standard quantities (e.g. firing rates) rather than parameter optimization. MINT is thus typically very fast to train (Table 1), on the order of seconds using generic hardware (no GPUs). This speed reflects the simple operations involved in constructing the library of neural-state trajectories: filtering of spikes and averaging across trials. At the same time we stress that MINT is a method for leveraging a trajectory library, not a method for constructing it. One may sometimes wish to use alternatives to trial-averaging, either of necessity or because they improve trajectory estimates. For example, for the MC_RTT task we used AutoLFADS to infer the library. Training was consequently much slower (hours rather than seconds) because of the time taken to estimate rates. Training time could be reduced back to seconds using a different approach – grouping into pseudo-conditions and averaging – but performance was reduced. Thus, training will typically be very fast, but one may choose time-consuming methods when appropriate.”

      Comment 5. I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract. 

      As discussed above, the manuscript has been revised to clarify that our statement was accurate.

      Comment 6. In addition to the above technical concerns, I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics (e.g., fixed points/limit cycles). While it is of course valid and even insightful to propose different assumptions from existing models as the authors do here, they do not actually translate these assumptions into a new model. Without a model and by just tabulating the data, I don't believe we can provide interpretation or advance the understanding of the fundamentals behind neural computations. As such, I am not clear as to how this library building approach can advance neuroscience or how these assumptions are useful. I think the authors should clarify and discuss this point. 

      As requested, a major goal of the revision has been to clarify the scientific motivations underlying MINT’s design. In addition to many textual changes, we have added figures (Figures 1a,b and 6a,b) to outline the two competing scientific perspectives that presently exist. This topic is also addressed by extensions of existing analyses and by new analyses (e.g. Figure 6c-g). 

      In our view these additions have dramatically improved the manuscript. This is especially true because we think R3’s concerns, expressed above, are reasonable. If the perspective in Figure 1a is correct, then R3 is right and MINT is essentially a hack that fails to model the data. MINT would still be effective in many circumstances (as we show), but it would be unprincipled. This would create limitations, just as the reviewer argues. On the other hand, if the perspective in Figure 1b is correct, then MINT is quite principled relative to traditional approaches. Traditional approaches make assumptions (a fixed subspace, consistent neuron-kinematic correlations) that are not correct under Figure 1b.

      We don’t expect R3 to agree with our scientific perspective at this time (though we hope to eventually convince them). To us, the key is that we agree with R3 that the manuscript needs to lay out the different perspectives and their implications, so that readers have a good sense of the possibilities they should be considering. The revised manuscript is greatly improved in this regard.

      Comment 7. Related to 6, there seems to be a logical inconsistency between the operations of MINT and one of its three assumptions, namely, sparsity. The authors state that neural states are sparsely distributed in some neural dimensions (Figure 1a, bottom). If this is the case, then why does MINT extend its decoding scope by interpolating known neural states (and behavior) in the training library? This interpolation suggests that the neural states are dense on the manifold rather than sparse, thus being contradictory to the assumption made. If interpolation-based dense meshes/manifolds underlie the data, then why not model the neural states through the subspace or manifold representations? I think the authors should address this logical inconsistency in MINT, especially since this sparsity assumption also questions the low-dimensional subspace/manifold assumption that is commonly made. 

      We agree this is an important issue, and have added an analysis on this topic (Figure 4d). The key question is simple and empirical: during decoding, does interpolation cause MINT to violate the assumption of sparsity? R3 is quite right that in principle it could. If spiking observations argue for it, MINT’s interpolation could create a dense manifold during decoding rather than a sparse one. The short answer is that empirically this does not happen, in agreement with expectations under Figure 1b. Rather than interpolating between distant states and filling in large ‘voids’, interpolation is consistently local. This is a feature of the data, not of the decoder (MINT doesn’t insist upon sparsity, even though it is designed to work best in situations where the manifold is sparse).

      In addition to adding Figure 4d, we added the following (in an earlier section):

      “The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations. Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I appreciate the detailed methods section, however, more specifics should be integrated into the main text. For example on Line 238, it should additionally be stated how many minutes were used for training and metrics like the MAE which is used later should be reported here.

      Thank you for this suggestion. We now report the duration of training data in the main text:

      “Decoding R^2 was .968 over ~7.1 minutes of test trials based on ~4.4 minutes of training data.”

      We have also added similar specifics throughout the manuscript, e.g. in the Fig. 5 legend:

      “Results are based on the following numbers of training / test trials: MC\_Cycle (174 train, 99 test), MC\_Maze (1721 train, 574 test), Area2\_Bump (272 train, 92 test), MC\_RTT (810 train, 268 test).”

      Similar additions were made to the legends for Fig. 6 and 8. Regarding the request to add MAE for the multitask network, we did not do so for the simple reason that the decoded variable (muscle activity) has arbitrary units. The raw MAE is thus not meaningful. We could of course have normalized, but at this point the MAE is largely redundant with the correlation. In contrast, the MAE is useful when comparing across the MC_Maze, Area2_Bump, and MC_RTT datasets, because they all involve the same scale (cm/s).

      Regarding the MC_RTT task, AutoLFADS was used to obtain robust spike rates, as reported in the methods. However, the rationale for splitting the neural trajectories after AutoLFADS is unclear. If the trajectories were split based on random recording gaps, this might lead to suboptimal performance? It might be advantageous to split them based on a common behavioural state? 

      When learning neural trajectories via AutoLFADS, spiking data is broken into short (but overlapping) segments, rates are estimated for each segment via AutoLFADs, and these rates are then stitched together across segments into long neural trajectories. If there had been no recording gaps, these rates could have been stitched into a single neural trajectory for this dataset. However, the presence of recording gaps left us no choice but to stitch together these rates into more than one trajectory. Fortunately, recording gaps were rare: for the decoding analysis of MC_RTT there were only two recording gaps and therefore three neural trajectories, each ~2.7 minutes in duration. 

      We agree that in general it is desirable to learn neural trajectories that begin and end at behaviorallyrelevant moments (e.g. in between movements). However, having these trajectories potentially end midmovement is not an issue in and of itself. During decoding, MINT is never stuck on a trajectory. Thus, if MINT were decoding states near the end of a trajectory that was cut short due to a training gap, it would simply begin decoding states from other trajectories or elsewhere along the same trajectory in subsequent moments. We could have further trimmed the three neural trajectories to begin and end at behaviorallyrelevant moments, but chose not to as this would have only removed a handful of potentially useful states from the library.

      We now describe this in the Methods:

      “Although one might prefer trajectory boundaries to begin and end at behaviorally relevant moments (e.g. a stationary state), rather than at recording gaps, the exact boundary points are unlikely to be consequential for trajectories of this length that span multiple movements. If MINT estimates a state near the end of a long trajectory, its estimate will simply jump to another likely state on a different trajectory (or earlier along the same trajectory) in subsequent moments. Clipping the end of each trajectory to an earlier behaviorally-relevant moment would only remove potentially useful states from the libraries.”

      Are the training and execution times in Table 1 based on pure Matlab functions or Mex files? If it's Mex files as suggested by the code, it would be good to mention this in the Table caption.

      They are based on a combination of MATLAB and MEX files. This is now clarified in the table caption:

      “Timing measurements taken on a Macbook Pro (on CPU) with 32GB RAM and a 2.3 GHz 8-Core Intel Core i9 processor. Training and execution code used for measurements was written in MATLAB (with the core recursion implemented as a MEX file).”

      As the method most closely resembles a Bayesian decoder it would be good to compare performance against a Naive Bayes decoder. 

      We agree and have now done so. The following has been added to the text:

      “A natural question is thus whether a simpler Bayesian decoder would have yielded similar results. We explored this possibility by testing a Naïve Bayes regression decoder [85] using the MC_Maze dataset. This decoder performed poorly, especially when decoding velocity (R2 = .688 and .093 for hand position and velocity, respectively), indicating that the specific modeling assumptions that differentiate MINT from a naive Bayesian decoder are important drivers of MINT’s performance.”

      Line 199 Typo: The assumption of stereotypy trajectory also enables neural states (and decoded behaviors) to be updated in between time bins. 

      Fixed

      Table 3: It's unclear why the Gaussian binning varies significantly across different datasets. Could the authors explain why this is the case and what its implications might be? 

      We have added the following description in the “Filtering, extracting, and warping data on each trial” subsection of the Methods to discuss how 𝜎 may vary due to the number of trials available for training and how noisy the neural data for those trials is:

      “First, spiking activity for each neuron on each trial was temporally filtered with a Gaussian to yield single-trial rates. Table 3 reports the Gaussian standard deviations σ (in milliseconds) used for each dataset. Larger values of σ utilize broader windows of spiking activity when estimating rates and therefore reduce variability in those rate estimates. However, large σ values also yield neural trajectories with less fine-grained temporal structure. Thus, the optimal σ for a dataset depends on how variable the rate estimates otherwise are.”

      An implementation of the method in an open-source programming language could further enhance the widespread use of the tool. 

      We agree this would be useful, but have yet not implemented the method in any other programming languages. Implementation in Python is still a future goal.

      Reviewer #2 (Recommendations For The Authors): 

      - Figures 4 and 5 should show the error bars on the horizontal axis rather than portraying them vertically. 

      [Note that these are now Figures 5 and 6]

      The figure legend of Figure 5 now clarifies that the vertical ticks are simply to aid visibility when symbols have very similar means and thus overlap visually. We don’t include error bars (for this analysis) because they are very small and would mostly be smaller than the symbol sizes. Instead, to indicate certainty regarding MINT’s performance measurements, the revised text now gives error ranges for the correlations and MAE values in the context of Figure 4c. These error ranges were computed as the standard deviation of the sampling distribution (computed via resampling of trials) and are thus equivalent to SEMs. The error ranges are all very small; e.g. for the MC_Maze dataset the MAE for x-velocity is 4.5 +/- 0.1 cm/s. (error bars on the correlations are smaller still).

      Thus, for a given dataset, we can be quite certain of how well MINT performs (within ~2% in the above case). This is reassuring, but we also don’t want to overemphasize this accuracy. The main sources of variability one should be concerned about are: 1) different methods can perform differentially well for different brain areas and tasks, 2) methods can decode some behavioral variables better than others, and 3) performance depends on factors like neuron-count and the number of training trials, in ways that can differ across decode methods. For this reason, the study examines multiple datasets, across tasks and brain areas, and measures performance for a range of decoded variables. We also examine the impact of training-set-size (Figure 8a) and population size (solid traces in Fig. 8b, see R2’s next comment below). 

      There is one other source of variance one might be concerned about, but it is specific to the neuralnetwork approaches: different weight initializations might result in different performance. For this reason, each neural-network approach was trained ten times, with the average performance computed. The variability around this average was very small, and this is now stated in the Methods.

      “For the neural networks, the training/testing procedure was repeated 10 times with different random seeds. For most behavioral variables, there was very little variability in performance across repetitions. However, there were a few outliers for which variability was larger. Reported performance for each behavioral group is the average performance across the 10 repetitions to ensure results were not sensitive to any specific random initialization of each network.”

      - For Figure 6, it is unclear whether the neuron-dropping process was repeated multiple times. If not, it should be since the results will be sensitive to which particular subsets of neurons were "dropped". In this case, the results presented in Figure 6 should include error bars to describe the variability in the model performance for each decoder considered. 

      A good point. The results in Figure 8 (previously Figure 6) were computed by averaging over the removal of different random subsets of neurons (50 subsets per neuron count), just as the reviewer requests. The figure has been modified to include the standard deviation of performance across these 50 subsets. The legend clarifies how this was done.

      Reviewer #3 (Recommendations For The Authors): 

      Other comments: 

      (1) [Line 185-188] The authors argue that in a 100-dimensional space with 10 possible discretized values, 10^100 potential neural states need to be computed. But I am not clear on this. This argument seems to hold only in the absence of a model (as in MINT). For a model, e.g., Kalman filter or AutoLFADS, information is encoded in the latent state. For example, a simple Kalman filter for a linear model can be used for efficient inference. This 10^100 computation isn't a general problem but seems MINT-specific, please clarify. 

      We agree this section was potentially confusing. It has been rewritten. We were simply attempting to illustrate why maximum likelihood computations are challenging without constraints. MINT simplifies this problem by adding constraints, which is why it can readily provide data likelihoods (and can do so using a Poisson model). The rewritten section is below:

      “Even with 1000 samples for each of the neural trajectories in Figure 3, there are only 4000 possible neural states for which log-likelihoods must be computed (in practice it is fewer still, see Methods). This is far fewer than if one were to naively consider all possible neural states in a typical rate- or factor-based subspace. It thus becomes tractable to compute log-likelihoods using a Poisson observation model. A Poisson observation model is usually considered desirable, yet can pose tractability challenges for methods that utilize a continuous model of neural states. For example, when using a Kalman filter, one is often restricted to assuming a Gaussian observation model to maintain computational tractability “

      (2) [Figure 6b] Why do the authors set the dropped neurons to zero in the "zeroed" results of the robustness analysis? Why not disregard the dropped neurons during the decoding process? 

      We agree the terminology we had used in this section was confusing. We have altered the figure and rewritten the text. The following, now at the beginning of that section, addresses the reviewer’s query: 

      “It is desirable for a decoder to be robust to the unexpected loss of the ability to detect spikes from some neurons. Such loss might occur while decoding, without being immediately detected. Additionally, one desires robustness to a known loss of neurons / recording channels. For example, there may have been channels that were active one morning but are no longer active that afternoon. At least in principle, MINT makes it very easy to handle this second situation: there is no need to retrain the decoder, one simply ignores the lost neurons when computing likelihoods. This is in contrast to nearly all other methods, which require retraining because the loss of one neuron alters the optimal parameters associated with every other neuron.”

      The figure has been relabeled accordingly; instead of the label ‘zeroed’, we use the label ‘undetected neuron loss’.

      (3) Authors should provide statistical significance on their results, which they already did for Fig. S3a,b,c but missing on some other figures/places. 

      We have added error bars in some key places, including in the text when quantifying MINT’s performance in the context of Figure 4. Importantly, error bars are only as meaningful as the source of error they assess, and there are reasons to be careful given this. The standard method for putting error bars on performance is to resample trials, which is indeed what we now report. These error bars are very small. For example, when decoding horizontal velocity for the MC_Maze dataset, the correlation between MINT’s decode and the true velocity had a mean and SD of the sampling distribution of 0.963 +/- 0.001. This means that, for a given dataset and target variable, we have enough trials/data that we can be quite certain of how well MINT performs. However, we want to be careful not to overstate this certainty. What one really wants to know is how well MINT performs across a variety of datasets, brain areas, target variables, neuron counts, etc. It is for this reason that we make multiple such comparisons, which provides a more valuable view of performance variability.

      For Figure 7, error bars are unavailable. Because this was a benchmark, there was exactly one test-set that was never seen before. This is thus not something that could be resampled many times (that would have revealed the test data and thus invalidated the benchmark, not to mention that some of these methods take days to train). We could, in principle, have added resampling to Figure 5. In our view it would not be helpful and could be misleading for the reasons noted above. If we computed standard errors using different train/test partitions, they would be very tight (mostly smaller than the symbol sizes), which would give the impression that one can be quite certain of a given R^2 value. Yet variability in the train/test partition is not the variability one is concerned about in practice. In practice, one is concerned about whether one would get a similar R^2 for a different dataset, or brain area, or task, or choice of decoded variable. Our analysis thus concentrated on showing results across a broad range of situations. In our view this is a far more relevant way of illustrating the degree of meaningful variability (which is quite large) than resampling, which produces reassuringly small but (mostly) irrelevant standard errors.

      Error bars are supplied in Figure 8b. These error bars give a sense of variability across re-samplings of the neural population. While this is not typically the source of variability one is most concerned about, for this analysis it becomes appropriate to show resampling-based standard errors because a natural concern is that results may depend on which neurons were dropped. So here it is both straightforward, and desirable, to compute standard errors. (The fact that MINT and the Wiener filter can be retrained many times swiftly was also key – this isn’t true of the more expressive methods). Figure S1 also uses resampling-based confidence intervals for similar reasons.

      (4) [Line 431-437] Authors state that MINT outperforms other methods with the PSTH R^2 metric (trial-averaged smoothed spikes for each condition). However, I think this measure may not provide a fair comparison and is confounded because MINT's library is built using PSTH (i.e., averaged firing rate) but other methods do not use the PSTH. The author should clarify this. 

      The PSTH R^2 metric was not created by us; it was part of the Neural Latents Benchmark. They chose it because it ensures that a method cannot ‘cheat’ (on the Bits/Spike measure) by reproducing fine features of spiking while estimating rates badly. We agree with the reviewer’s point: MINT’s design does give it a potential advantage in this particular performance metric. This isn’t a confound though, just a feature. Importantly, MINT will score well on this metric only if MINT’s neural state estimate is accurate (including accuracy in time). Without accurate estimation of the neural state at each time, it wouldn’t matter that the library trajectory is based on PSTHs. This is now explicitly stated:

      “This is in some ways unsurprising: MINT estimates neural states that tend to resemble (at least locally) trajectories ‘built’ from training-set-derived rates, which presumably resemble test-set rates. Yet strong performance is not a trivial consequence of MINT’s design. MINT does not ‘select’ whole library trajectories; PSTH R2 will be high only if condition (c), index (k), and the interpolation parameter (α) are accurately estimated for most moments.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      In the presented manuscript, the authors investigate how neural networks can learn to replay presented sequences of activity. Their focus lies on the stochastic replay according to learned transition probabilities. They show that based on error-based excitatory and balance-based inhibitory plasticity networks can selforganize towards this goal. Finally, they demonstrate that these learning rules can recover experimental observations from song-bird song learning experiments. 

      Overall, the study appears well-executed and coherent, and the presentation is very clear and helpful. However, it remains somewhat vague regarding the novelty. The authors could elaborate on the experimental and theoretical impact of the study, and also discuss how their results relate to those of Kappel et al, and others (e.g., Kappel et al (doi.org/10.1371/journal.pcbi.1003511))). 

      We agree with the reviewer that our previous manuscript lacked comparison with previously published similar works. While Kappel et al. demonstrated that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key distinction from our model is that their neural representations acquire deterministic sequential activations, rather than exhibiting stochastic transitions governing Markovian dynamics. Specifically, in their model, the neural representation of state B would be different in the sequences ABC and CBA, resulting in distinct deterministic representations like ABC and C'B'A', where ‘A’ and ‘A'’ are represented by different neural states (e.g., activations of different cell assemblies). In contrast, our network learns to generate stochastically transitioning cell assemblies which replay Markovian trajectories of spontaneous activity obeying the learned transition probabilities between neural representations of states. For example, starting from reactivation from assembly ‘A’, there may be an 80% probability to transition to assembly ‘B’ and 20% to ‘C’. Although Kappel et al.'s model successfully solves HMMs, their neural representations do not themselves stochastically transition between states according to the learned model. Similar to the Kappel et al.'s model, while the models proposed in Barber (2002) and Barber and Agakov (2002) learn the Markovian statistics, these models learned a static spatiotemporal input patterns only and how assemblies of neurons show stochastic transition in spontaneous activity has been still unclear. In contrast with these models, our model captures the probabilistic neural state trajectories, allowing spontaneous replay of experienced sequences with stochastic dynamics matching the learned environmental statistics.

      We have included new sentences for explain these in ll. 509-533 in the revised manuscript.

      Overall, the work could benefit if there was either (A) a formal analysis or derivation of the plasticity rules involved and a formal justification of the usefulness of the resulting (learned) neural dynamics; 

      We have included a derivation of our plasticity rules in ll. 630-670 in the revised manuscript. Consistent with our claim that excitatory plasticity updates the excitatory synapse to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy between the recurrent prediction and the output firing rate. Similarly, for inhibitory plasticity, we defined the cost function that evaluates the difference between the excitatory and inhibitory potential within each neuron. We showed that the resulting inhibitory plasticity rule updates the inhibitory synapses to maintain the excitation-inhibition balance.

      and/or (B) a clear connection of the employed plasticity rules to biological plasticity and clear testable experimental predictions. Thus, overall, this is a good work with some room for improvement. 

      Our proposed plasticity mechanism could be implemented through somatodendritic interactions. Analogous to previous computational works (Urbanczik and Senn., 2014; Asabuki and Fukai., 2020; Asabuki et al., 2022), our model suggests that somatic responses may encode the stimulus-evoked neural activity states, while dendrites encode predictions based on recurrent dynamics that aim to minimize the discrepancy between somatic and dendritic activity. To directly test this hypothesis, future experimental studies could simultaneously record from both somatic and dendritic compartments to investigate how they encode evoked responses and predictive signals during learning (Francioni et al., 2022).

      We have included new sentences for explain these in ll. 476-484 in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This work proposes a synaptic plasticity rule that explains the generation of learned stochastic dynamics during spontaneous activity. The proposed plasticity rule assumes that excitatory synapses seek to minimize the difference between the internal predicted activity and stimulus-evoked activity, and inhibitory synapses try to maintain the E-I balance by matching the excitatory activity. By implementing this plasticity rule in a spiking recurrent neural network, the authors show that the state-transition statistics of spontaneous excitatory activity agree with that of the learned stimulus patterns, which are reflected in the learned excitatory synaptic weights. The authors further demonstrate that inhibitory connections contribute to well-defined state transitions matching the transition patterns evoked by the stimulus. Finally, they show that this mechanism can be expanded to more complex state-transition structures including songbird neural data. 

      Strengths: 

      This study makes an important contribution to computational neuroscience, by proposing a possible synaptic plasticity mechanism underlying spontaneous generations of learned stochastic state-switching dynamics that are experimentally observed in the visual cortex and hippocampus. This work is also very clearly presented and well-written, and the authors conducted comprehensive simulations testing multiple hypotheses. Overall, I believe this is a well-conducted study providing interesting and novel aspects of the capacity of recurrent spiking neural networks with local synaptic plasticity. 

      Weaknesses: 

      This study is very well-thought-out and theoretically valuable to the neuroscience community, and I think the main weaknesses are in regard to how much biological realism is taken into account. For example, the proposed model assumes that only synapses targeting excitatory neurons are plastic, and uses an equal number of excitatory and inhibitory neurons. 

      We agree with the reviewer. The network shown in the previous manuscript consists of an equal number of excitatory and inhibitory neurons, which seems to lack biological plausibility. Therefore, we first tested whether a biologically plausible scenario would affect learning performance by setting the ratio of excitatory to inhibitory neurons to 80% and 20% (Supplementary Figure 7a; left). Even in such a scenario, the network still showed structured spontaneous activity (Supplementary Figure 7a; center), with transition statistics of replayed events matching the true transition probabilities (Supplementary Figure 7a; right). We then asked whether the model with our plasticity rule applied to all synapses would reproduce the corresponding stochastic transitions. We found that the network can learn transition statistics but only under certain conditions. The network showed only weak replay and failed to reproduce the appropriate transition (Supplementary Fig. 7b) if the inhibitory neurons were no longer driven by the synaptic currents reflecting the stimulus, due to a tight balance of excitatory and inhibitory currents on the inhibitory neurons. We then tested whether the network with all synapses plastic can learn transition statistics if the external inputs project to the inhibitory neurons as well. We found that, when each stimulus pattern activates a non-overlapping subset of neurons, the network does not exhibit the correct stochastic transition of assembly reactivation (Supplementary Fig. 7c). Interestingly, when each neuron's activity is triggered by multiple stimuli and has mixed selectivity, the reactivation reproduced the appropriate stochastic transitions (Supplementary Fig. 7d).

      We have included these new results as new Supplementary Figure 7 and they are explained in ll.215-230 in the revised manuscript.

      The model also assumes Markovian state dynamics while biological systems can depend more on history. This limitation, however, is acknowledged in the Discussion. 

      We have included the following sentence to provide a possible solution to this limitation: “Therefore, to learn higher-order stochastic transitions, recurrent neural networks like ours may need to integrate higher-order inputs with longer time scales.” in ll.557-559 in the revised manuscript. 

      Finally, to simulate spontaneous activity, the authors use a constant input of 0.3 throughout the study. Different amplitudes of constant input may correspond to different internal states, so it will be more convincing if the authors test the model with varying amplitudes of constant inputs. 

      We thank the reviewer for pointing this out. In the revised manuscript, we have tested constant input with three different strengths. If the strength is moderate, the network showed accurate encoding of transition statistics in the spontaneous activity as we have seen in Fig.2. We have additionally shown that the weaker background input causes spontaneous activity with lower replay rate, which in turn leads to high variance of encoded transition, while stronger inputs make assembly replay transitions more uniform. We have included these new results as new Supplementary Figure 6 and they are explained in ll.211214 in the revised manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      Asabuki and Clopath study stochastic sequence learning in recurrent networks of Poisson spiking neurons that obey Dale's law. Inspired by previous modeling studies, they introduce two distinct learning rules, to adapt excitatory-to-excitatory and inhibitory-to-excitatory synaptic connections. Through a series of computer experiments, the authors demonstrate that their networks can learn to generate stochastic sequential patterns, where states correspond to non-overlapping sets of neurons (cell assemblies) and the state-transition conditional probabilities are first-order Markov, i.e., the transition to a given next state only depends on the current state. Finally, the authors use their model to reproduce certain experimental songbird data involving highly-predictable and highly-uncertain transitions between song syllables. 

      Strengths: 

      This is an easy-to-follow, well-written paper, whose results are likely easy to reproduce. The experiments are clear and well-explained. The study of songbird experimental data is a good feature of this paper; finches are classical model animals for understanding sequence learning in the brain. I also liked the study of rapid task-switching, it's a good-to-know type of result that is not very common in sequence learning papers. 

      Weaknesses: 

      While the general subject of this paper is very interesting, I missed a clear main result. The paper focuses on a simple family of sequence learning problems that are well-understood, namely first-order Markov sequences and fully visible (nohidden-neuron) networks, studied extensively in prior work, including with spiking neurons. Thus, because the main results can be roughly summarized as examples of success, it is not entirely clear what the main point of the authors is. 

      We apologize the reviewer that our main claim was not clear. While various computational studies have suggested possible plasticity mechanisms for embedding evoked activity patterns or their probability structures into spontaneous activity (Litwin-Kumar et al., Nat. Commun. 2014, Asabuki and Fukai., Biorxiv 2023), how transition statistics of the environment are learned in spontaneous activity is still elusive and poorly understood. Furthermore, while several network models have been proposed to learn Markovian dynamics via synaptic plasticity (Brea, et al. (2013); Pfister et al. (2004); Kappel et al. (2014)), they have been limited in a sense that the learned network does not show stochastic transition in a neural state space. For instance, while Kappel et al. demonstrated that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key distinction from our model is that their neural representations acquire deterministic sequential activations, rather than exhibiting stochastic transitions governing Markovian dynamics. Specifically, in their model, the neural representation of state B would be different in the sequences ABC and CBA, resulting in distinct deterministic representations like ABC and C'B'A', where ‘A’ and ‘A'’ are represented by different neural states (e.g., activations of different cell assemblies). In contrast, our network learns to generate stochastically transitioning cell assemblies that replay Markovian trajectories of spontaneous activity obeying the learned transition probabilities between neural representations of states. For example, starting from reactivation from assembly ‘A’, there may be an 80% probability to transition to assembly ‘B’ and 20% to ‘C’. Although Kappel et al.'s model successfully solves HMMs, their neural representations do not themselves stochastically transition between states according to the learned model. Similar to the Kappel et al.'s model, while the models proposed in Barber (2002) and Barber and Agakov (2002) learn the Markovian statistics, these models learned a static spatiotemporal input patterns only and how assemblies of neurons show stochastic transition in spontaneous activity has been still unclear. In contrast with these models, our model captures the probabilistic neural state trajectories, allowing spontaneous replay of experienced sequences with stochastic dynamics matching the learned environmental statistics.

      We have explained this point in ll.509-533 in the revised manuscript.

      Going into more detail, the first major weakness I see in this paper is the heuristic choice of learning rules. The paper studies Poisson spiking neurons (I return to this point below), for which learning rules can be derived from a statistical objective, typically maximum likelihood. For fully-visible networks, these rules take a simple form, similar in many ways to the E-to-E rule introduced by the authors. This more principled route provides quite a lot of additional understanding on what is to be expected from the learning process. 

      We thank the reviewer for pointing this out. To better demonstrate the function of our plasticity rules, we have included the derivation of the rules of synaptic plasticity in ll. 630-670 in the revised manuscript. Consistent with our claim that excitatory plasticity updates the excitatory synapse to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy between the recurrent prediction and the output firing rate. Similarly, for inhibitory plasticity, we defined the cost function that evaluates the difference between the excitatory and inhibitory potential within each neuron. We showed that the resulting inhibitory plasticity rule updates the inhibitory synapses to maintain the excitation-inhibition balance.

      For instance, should maximum likelihood learning succeed, it is not surprising that the statistics of the training sequence distribution are reproduced. Moreover, given that the networks are fully visible, I think that the maximum likelihood objective is a convex function of the weights, which then gives hope that the learning rule does succeed. And so on. This sort of learning rule has been studied in a series of papers by David Barber and colleagues [refs. 1, 2 below], who applied them to essentially the same problem of reproducing sequence statistics in recurrent fully-visible nets. It seems to me that one key difference is that the authors consider separate E and I populations, and find the need to introduce a balancing I-to-E learning rule. 

      The reviewer’s understanding that inhibitory plasticity to maintain EI balance is one of a critical difference from previous works is correct. However, we believe that the most striking point of our study is that we have shown numerically that predictive plasticity rules enable recurrent networks to learn and replay the assembly activations whose transition statistics match those of the evoked activity. Please see our reply above.

      Because the rules here are heuristic, a number of questions come to mind. Why these rules and not others - especially, as the authors do not discuss in detail how they could be implemented through biophysical mechanisms? When does learning succeed or fail? What is the main point being conveyed, and what is the contribution on top of the work of e.g. Barber, Brea, et al. (2013), or Pfister et al. (2004)? 

      Our proposed plasticity mechanism could be implemented through somatodendritic interactions. Analogous to previous computational works (Senn, Asabuki), our model suggests that somatic responses may encode the stimulusevoked neural activity states, while dendrites encode predictions based on recurrent dynamics that aim to minimize the discrepancy between somatic and dendritic activity. To directly test this hypothesis, future experimental studies could simultaneously record from both somatic and dendritic compartments to investigate how they encode evoked responses and predictive signals during learning.

      To address the point of the reviewer, we conducted addionnal simulations to test where the model fails. We found that the model with our plasticity rule applied to all synapses only showed faint replays and failed to replay the appropriate transition (Supplementary Fig. 7b). This result is reasonable because the inhibitory neurons were no longer driven by the synaptic currents reflecting the stimulus, due to a tight balance of excitatory and inhibitory currents on the inhibitory neurons. Our model predicts that mixed selectivity in the inhibitory population is crucial to learn an appropriate transition statistics (Supplementary Fig. 7d). Future work should clarify the role of synaptic plasticity on inhibitory neurons, especially plasticity at I to I synapses. We have explained this result as new supplementary Figure7 in the revised manuscript.

      The use of a Poisson spiking neuron model is the second major weakness of the study. A chief challenge in much of the cited work is to generate stochastic transitions from recurrent networks of deterministic neurons. The task the authors set out to do is much easier with stochastic neurons; it is reasonable that the network succeeds in reproducing Markovian sequences, given an appropriate learning rule. I believe that the main point comes from mapping abstract Markov states to assemblies of neurons. If I am right, I missed more analyses on this point, for instance on the impact that varying cell assembly size would have on the findings reported by the authors.

      The reviewer’s understanding is correct. Our main point comes from mapping Markov statistics to replays of cell assemblies. In the revised manuscript, we performed additional simulations to ask whether varying the size of the cell assemblies would affect learning. We ran simulations with two different configurations in the task shown in Figure 2. The first configuration used three assemblies with a size ratio of 1:1.5:2. After training, these assemblies exhibited transition statistics that closely matched those of the evoked activity (Supplementary Fig.4a,b). In contrast, the second configuration, which used a size ratio of 1:2:3, showed worse performance compared to the 1:1.5:2 case (Supplementary Fig.4c,d). These results suggest that the model can learn appropriate transition statistics as long as the size ratio of the assemblies is not drastically varied.

      Finally, it was not entirely clear to me what the main fundamental point in the HVC data section was. Can the findings be roughly explained as follows: if we map syllables to cell assemblies, for high-uncertainty syllable-to-syllable transitions, it becomes harder to predict future neural activity? In other words, is the main point that the HVC encodes syllables by cell assemblies? 

      The reviewer's understanding is correct. We wanted to show that if the HVC learns transition statistics as a replay of cell assemblies, a high-uncertainty syllable-to-syllable transition would make predicting future reactivations more difficult, since trial-averaged activities (i.e., poststimulus activities; PSAs) marginalized all possible transitions in the transition diagram.

      (1) Learning in Spiking Neural Assemblies, David Barber, 2002. URL: https://proceedings.neurips.cc/paper/2002/file/619205da514e83f869515c782a328d3c-Paper.pdf  

      (2) Correlated sequence learning in a network of spiking neurons usingmaximum likelihood, David Barber, Felix Agakov, 2002. URL: http://web4.cs.ucl.ac.uk/staff/D.Barber/publications/barber-agakovTR0149.pdf  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      In more detail: 

      A) Theoretical analysis 

      The plasticity rules in the study are introduced with a vague reference to previous theoretical studies of others. Doing this, one does not provide any formal insight as to why these plasticity rules should enable one to learn to solve the intended task, and whether they are optimal in some respect. This becomes noticeable, especially in the discussion of the importance of inhibitory balance, which does not go into any detail, but rather only states that its required, both in the results and discussion sections. Another unclarity appears when error-based learning is discussed and compared to Hebbian plasticity, which, as you state, "alone is insufficient to learn transition probabilities". It is not evident how this claim is warranted, nor why error-based plasticity in comparison should be able to perform this (other than referring to the simulation results). Please either clarify formally (or at least intuitively) how plasticity rules result in the mentioned behavior, or alternatively acknowledge explicitly the (current) lack of intuition. 

      The lack of formal discussion is a relevant shortcoming compared to previous research that showed very similar results with formally more rigorous and principled approaches. In particular, Kappel et al derived explicitly how neural networks can learn to sample from HMMs using STDP and winner-take-all dynamics. Even though this study has limitations, the relation with respect to that work should be made very clear; potentially the claims of novelty of some results (sampling) should be adjusted accordingly. See also Yanping Huang, Rajesh PN Rao (NIPS 2014), and possibly other publications. While it might be difficult to formally justify the learning rules post-hoc, it would be very helpful to the field if you very clearly related your work to that of others, where learning rules have been formally justified, and elaborate on the intuition of how the employed rules operate and interact (especially for inhibition). 

      Lastly, while the importance of sampling learned transition probabilities is discussed, the discussion again remains on a vague level, characterized by the lack of references in the relevant paragraphs. Ideally, there should be a proof of concept or a formal understanding of how the learned behaviour enables to solve a problem that is not solved by deterministic networks. Please incorporate also the relation to the literature on neural sampling/planning/RL etc. and substantiate the claims with citations. 

      We have included sentences in ll. 691-696 in the revised manuscript to explain that for Poisson spiking neurons, the derived learning rule is equivalent to the one that minimizes the Kullback-Leibler divergence between the distributions of output firing and the dendritic prediction, in our case, the recurrent prediction (Asabuki and Fukai; 2020). Thus, the rule suggests that the recurrent prediction learns the statistical model of the evoked activity, which in turn allows the network to reproduce the learned transition statistics.

      We have also added a paragraph to discuss the differences between previously published similar models (e.g., Kappel et al.). Please see our response above.

      B) Connection to biology 

      The plasticity rules in the study are introduced with a vague reference to previous theoretical studies of others. Please discuss in more detail if these rules (especially the error-based learning rule) could be implemented biologically and how this could be achieved. Are there connections to biologically observed plasticity? E.g. for error-based plasticity has been discussed in the original publication by Urbanzcik and Senn, or more recently by Mikulasch et al (TINS 2023). The biological plausibility of inhibitory balance has been discussed many times before, e.g. by Vogels and others, and a citation would acknowledge that earlier work. This also leaves the question of how neurons in the songbird experiment could adapt and if the model does capture this well (i.e., do they exhibit E-I balance? etc), which might be discussed as well. 

      Last, please provide some testable experimental predictions. By proposing an interesting experimental prediction, the model could become considerably more relevant to experimentalists. Also, are there potentially alternative models of stochastic sequence learning (e.g., Kappel et al)? How could they be distinguished? (especially, again, why not Hebbian/STDP learning?) 

      We have cited the Vogels paper to acknowledge the earlier work. We have also included additional paragraphs to discuss a possible biologically plausible implementation of our model and how our model differs from similar models proposed previously (e.g., Kappel et al.). Please see our response above.

      Other comments 

      As mentioned, a derivation of recurrent plasticity rules is missing, and parameters are chosen ad-hoc. This leaves the question of how much the results rely on the specific choice of parameters, and how robust they are to perturbations. As a robustness check, please clarify how the duration of the Markov states influences performance. It can be expected that this interacts with the timescale of recurrent connections, so having longer or shorter Markov states, as it would be in reality, should make a difference in learning that should be tested and discussed.

      We thank the reviewer for pointing this out. To address this point, we performed new simulations and asked to what extent the duration of Markov states affect performance. Interestingly, even when the network was trained with input states of half the duration, the distributions of the durations of assembly reactivations remain almost identical to those in the original case (Supplementary Figure 3a). Furthermore, the transition probabilities in the replay were still consistent with the true transition probabilities (Supplementary Figure 3b). We have also included the derivation of our plasticity rule in ll. 630-670 in the revised manuscript. 

      Similarly, inhibitory plasticity operates with the same plasticity timescale parameter as excitatory plasticity, but, as the authors discuss, lags behind excitatory plasticity in simulation as in experiment. Is this required or was the parameter chosen such that this behaviour emerges? Please clarify this in the methods section; moreover, it would be good to test if the same results appear with fast inhibitory plasticity. 

      We have performed a new simulation and showed that even when the learning rate of inhibitory plasticity was larger than that of excitatory plasticity, inhibitory plasticity still occurred on a slower timescale than excitatory plasticity. We have included this result in a new Supplementary Figure 2 in the revised manuscript.

      What is the justification (biologically and theoretically) for the memory trace h and its impact on neural spiking? Is it required for the results or can it be left away? Since this seems to be an important and unconventional component of the model, please discuss it in more detail. 

      In the model, it is assumed that each stimulus presentation drives a specific subset of network neurons with a fixed input strength, which avoids convergence to trivial solutions. Nevertheless, we choose to add this dynamic sigmoid function to facilitate stable replay by regulating neuron activity to prevent saturation. We have explained this point in ll.605-611 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      I noticed a couple of minor typos: 

      Page 3 "underly"->"underlie" 

      Page 7 "assemblies decreased settled"->"assemblies decreased and settled"

      We have modified the text. We thank the reviewer for their careful review.

      I think Figure 1C is rather confusing and not intuitive. 

      We apologize that the Figure 1C was confusing. In the revised figure, we have emphasized the flow of excitatory and inhibitory error for updating synapses.

      Reviewer #3 (Recommendations For The Authors): 

      One possible path to improve the paper would be to establish a relationship between the proposed learning rules and e.g. the ones derived by Barber. 

      When reading the paper, I was left with a number of more detailed questions I omitted from the public review: 

      (1) The authors introduce a dynamic sigmoidal function for excitatory neurons, Eq. 3. This point requires more discussion and analysis. How does this impact the results? 

      In the model, it is assumed that each stimulus presentation drives a specific subset of network neurons with a fixed input strength, which avoids convergence to trivial solutions. Nevertheless, we choose to add this dynamic sigmoid function to facilitate stable replay by regulating neuron activity to prevent saturation. We have explained this point in ll.605-611 in the revised manuscript.

      (2) For Poisson spiking neurons, it would be great to understand what cell assemblies bring (apart from biological realism, i.e., reproducing data where assemblies can be found), compared to self-connected single neurons. For example, how do the results shown in Figure 2 depend on assembly size? 

      We have changed the cell assembly size ratio and how it affects learning performance in a new Supplementary Figure 4. Please see our reply above.

      (3) The authors focus on modeling spontaneous transitions, corresponding to a highly stochastic generative model (with most transition probabilities far from 1). A complementary question is that of learning to produce a set of stereotypical sequences, with probabilities close to 1. I wondered whether the learning rules and architecture of the model (in particular under the I-to-E rule) would also work in such a scenario. 

      We thank the reviewer for pointing this out. In fact, we had the same question, so we considered a situation in which the setting in Figure 2 includes both cases where the transition matrix is very stochastic (prob=0.5) and near deterministic (prob=0.9).

      (4) An analysis of what controls the time so that the network stays in a certain state would be welcome. 

      We trained the network model in two cases, one with a fast speed of plasticity and one with a slow speed of plasticity. As a result, we found that the duration of assembly becomes longer in the slow learning case than in the fast case. We have included these results as Supplementary Figure 5 in the revised manuscript.

      Regarding the presentation, given that this is a computational modeling paper, I wonder whether *all* the formulas belong in the Methods section. I found myself skipping back and forth to understand what the main text meant, mainly because I missed a few key equations. I understand that this is a style issue that is very much community-dependent, but I think readability would improve drastically if the main model and learning rule equations could be introduced in the main text, as they start being discussed. 

      We thank the reviewer for the suggestion. To cater to a wider audience, we try to explain the principle of the paper without using mathematical formulas as much as possible in the main text.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to quantify feral pig interactions in eastern Australia to inform disease transmission networks. They used GPS tracking data from 146 feral pigs across multiple locations to construct proximity-based social networks and analyze contact rates within and between pig social units.

      Strengths:

      (1) Addresses a critical knowledge gap in feral pig social dynamics in Australia.

      (2) Uses robust methodology combining GPS tracking and network analysis.

      (3) Provides valuable insights into sex-based and seasonal variations in contact rates.

      (4) Effectively contextualizes findings for disease transmission modeling and management.

      (5) Includes comprehensive ethical approval for animal research.

      (6) Utilizes data from multiple locations across eastern Australia, enhancing generalizability.

      Weaknesses:

      (1) Limited discussion of potential biases from varying sample sizes across populations

      This is a really good comment, and we will address this in the discussion as one of the limitations of the study.

      (2) Some key figures are in supplementary materials rather than the main text.

      We will move some of our supplementary material to the main text as suggested.

      (3) Economic impact figures are from the US rather than Australia-specific data.

      We included the impact figures that are available for Australia (for FDM), and we will include the estimated impact of ASF in Australia in the introduction.

      (4) Rationale for spatial and temporal thresholds for defining contacts could be clearer.

      We will improve the explanation of why we chose the spatial and temporal thresholds based on literature, the size of animals and GPS errors.

      (5) Limited discussion of ethical considerations beyond basic animal ethics approval.

      This research was conducted under an ethics committee's approval for collaring the feral pigs. This research is part of an ongoing pest management activity, and all the ethics approvals have been highlighted in the main manuscript.

      The authors largely achieved their aims, with the results supporting their conclusions about the importance of sex and seasonality in feral pig contact networks. This work is likely to have a significant impact on feral pig management and disease control strategies in Australia, providing crucial data for refining disease transmission models.

      Reviewer #2 (Public review):

      Summary:

      The paper attempts to elucidate how feral (wild) pigs cause distortion of the environment in over 54 countries of the world, particularly Australia.

      The paper displays proof that over $120 billion worth of facilities were destroyed annually in the United States of America.

      The authors have tried to infer that the findings of their work were important and possess a convincing strength of evidence.

      Strengths:

      (1) Clearly stating feral (wild) pigs as a problem in the environment.

      (2) Stating how 54 countries were affected by the feral pigs.

      (3) Mentioning how $120 billion was lost in the US, annually, as a result of the activities of the feral pigs.

      (4) Amplifying the fact that 14 species of animals were being driven into extinction by the feral pigs.

      (5) Feral pigs possessing zoonotic abilities.

      (6) Feral pigs acting as reservoirs for endemic diseases like brucellosis and leptospirosis.

      (7) Understanding disease patterns by the social dynamics of feral pig interactions.

      (8) The use of 146 GPS-monitored feral pigs to establish their social interaction among themselves.

      Weaknesses:

      (1) Unclear explanation of the association of either the female or male feral pigs with each other, seasonally.

      This will be better explain in the methods.

      (2) The "abstract paragraph" was not justified.

      We have justified the abstract paragraph as requested by the reviewer.

      (3) Typographical errors in the abstract.

      Typographical errors have been corrected in the Abstract.

      Reviewer #3 (Public review):

      Summary:

      The authors sought to understand social interactions both within and between groups of feral pigs, with the intent of applying their findings to models of disease transmission. The authors analyzed GPS tracking data from across various populations to determine patterns of contact that could support the transmission of a range of zoonotic and livestock diseases. The analysis then focused on the effects of sex, group dynamics, and seasonal changes on contact rates that could be used to base targeted disease control strategies that would prioritize the removal of adult males for reducing intergroup disease transmission.

      Strengths:

      It utilized GPS tracking data from 146 feral pigs over several years, effectively capturing seasonal and spatial variation in the social behaviors of interest. Using proximity-based social network analysis, this work provides a highly resolved snapshot of contact rates and interactions both within and between groups, substantially improving research in wildlife disease transmission. Results were highly useful and provided practical guidance for disease management, showing that control targeted at adult males could reduce intergroup disease transmission, hence providing an approach for the control of zoonotic and livestock diseases.

      Weaknesses:

      Despite their reliability, populations can be skewed by small sample sizes and limited generalizability due to specific environmental and demographic characteristics. Further validation is needed to account for additional environmental factors influencing social dynamics and contact rates

      This is a good point, and we thank the reviewer for pointing out this issue. We will discuss the potential biases due to sample size in our discussion. We agree that environmental factors need to be incorporated and tested for their influence on social dynamics, and this will be added to the discussion as we have plans to expand this research and conduct, the analysis to determine if environmental factors are influencing social dynamics.

    1. Author response:

      Reviewer #1:

      (1) This concern is addressed in the ESM6, and partly in the ESM1. Indeed, many of the concerns raised by the reviewer later are already addressed on the multiple supplementary materials provided, so we kindly ask the reviewer to read them before moving forward into the discussion.

      (2) This concern is reasonable, but its solution is not "extremely easy", as the reviewer states. The reviewer indicates the use of captive-based versus non-captive-based sources, remarking maximum lifespan, the main variable that is clearly expected to be systematically biased by the source of the data. Nevertheless, except for the ZIMS database, which includes only captive individuals, and some sources, as CNRS databases and EURING, which exclusively includes wild populations, the remaining databases, which are indeed where the vast majority of the data was collected from (i.e. Amniotes database, Birds of the World and AnAge) do not make any distinction. This means that they include just the maximum lifespan from the species as known by the authors of such databases' entries, regardless of provenance, which is also not usually made explicit by the database. Therefore, correcting for this would imply checking all the primary sources. Considering that these databases sometimes do not cite the primary source, but a secondary one, and that on several occasions such source is a specialized book that is not easily accessible, and still these referenced datasets may not indicate the source of the data, tracing all of this information becomes an arduous task, that would even render the usage of databases themselves useless. We will include some details about the concerns of database usage in the discussion to address this.

      Furthermore, it remains relevant to indicate that what we discuss later about the possible effects of captivity is about our usage of animals that come from both sources, not about the provenance of the literature-extracted data used (i.e. captive or wild maximum lifespan, for example), which is an independent matter. We can test for the first for next submission, but very difficultly could we test for the second (as the reviewer seems to be pointing to). In any case, as we do not have in any case the same species from both a captive and a wild source, it would be difficult to determine if the effect tested comes from captivity or from species-specific differences.

      (3) We will add data on the replicability of the glycation measurement in the next manuscript version. The CV for several individuals of different species measured repeated times is quite low (always below 2%).

      (4) The reviewer remarks reported here are already addressed on the supplementary material (ESM6), given the lack of space in the main manuscript. We therefore kindly ask the reviewer to read the supplementary material added to the submission. If the editors agree, all or a considerable part of this could be transferred to the main text for clarity, but this would severely extend the length of a text that the reviewer already considered very long.

      Reviewer #2:

      Thanks for spotting this issue with the coefficient, as it is actually a redaction mistake. It is a remnant of a previous version of the manuscript in which a log-log relation was performed instead. Previous reviewers raised concerns about the usage of log transformation for glycation, this variable being (theoretically) a proportion variable (to which we argue that it does not behave as such), which they considered not to be transformed with a logarithm. After this, we still finally took the decision of not to transform this variable. In this line, the transformations of variables were decided generally by preliminary data exploration. In this particular case, both approaches lead to the same conclusion of higher glycation resistance in the species with higher glucose. Nevertheless, we will consider exploring the comparison of different versions for the resubmission.

      About the issue related to handling time, this variable is not available, for the reasons already exposed in the answer to the other reviewer. Moreover, Kruskal-Wallis test, by its nature, does not determine differences in medians between groups per se, as the reviewer claims, but just differences in ranks-sums. It can be equivalently used for that purpose when the groups' distributions are similar, but not when they differ, as we see here with a difference in variance. What a significant outcome in a Kruskal-Wallis test tells us, thus, is just that the groups differ (in their ranks-sums), which here is plausibly caused by the higher variance in the stressed individuals. Even if we conclude that the average is higher in those groups, mere comparisons of averages for groups with very different variances render different interpretations than when homoscedasticity is met, particularly more so when the distribution of groups overlaps. For example, in a case like this, where the data is left censored (glucose levels cannot be lower than 0), most of this higher variance is related to many values in the stressed groups lying above all the baseline values. This, of course, would increase the average, but such a parameter would not mean the same as if the distributions did not overlap.

      Regarding the GVIFs, why the values are above 1.6 is not well known, but we do not consider this a major concern, as the values are never above 2.2, level usually considered more worrying. We will include a brief explanation of this in the results section. Also, we explicitly calculated life history variables adjusted for body mass, which should eliminate their otherwise strong correlation. There exist other biological and interpretational reasons justified in the ESM6 for using the residuals on the models, instead of the raw values, despite previously raised concerns.

      Given the asseveration by the reviewer that credible intervals are not to be used for the post hoc comparisons, as this is what the whiskers shown in Figure 4B represent, the affirmation of this graph suggesting any difference between groups remains doubtful. New comparisons have now been made with the function HPDinterval() applied to the differences between each diet category calculated from the posterior values of each group, confirming no significant differences exist.

      We do not understand the suggestion made in relation to the model shown in Table 2. Removing glucose from the model could have two results, as the reviewer indicates: 1. Maximum lifespan (ML) relates with glycation, potentially spuriously through the effect of glucose (in this case not included) on both; 2. ML does not relate to glycation, and therefore "high glycation levels do not preclude the evolution of long lifespans", which is what we are already showing with the current model, which also controls for glucose, in an attempt to determine if not just raw glycation values, but glycation resistance, relates to longevity. This is intended to asses if long-lived species may show mechanisms that avoid glycation, by showing levels lower than expected for a non-enzymatic reaction.

    1. Author response:

      In this manuscript, we have addressed one of the possible modes of recruitment of Swi6 to the putative heterochromatin loci.

      Our investigation was guided by earlier work showing ability of HP1 a to bind to a class of RNAs and the role of this binding in recruitment of HP1a to heterochromatin loci in mouse cells (Muchardt et al). While there has been no clarity about the mechanism of Swi6 recruitment given the multiple pathways being involved, the issue is compounded by the overall lack of understanding as to how Swi6 recruitment occurs only at the repeat regions. At the same time, various observations suggested a causal role of RNAi in Swi6 recruitment.

      Thus, guided by the work of Muchardt et al we developed a heuristic approach to explore a possibly direct link between Swi6 and heterochromatin through RNAi pathway. Interestingly, we found that the lysine triplet found in the hinge domain in HP1, which influences its recruitment to heterochromatin in mouse cells, is also present in the hinge domain of Swi6, although we were cautious, keeping in mind the findings of Keller et al showing another role of Swi6 in binding to RNAs and channeling them to the exosome pathway. 

      Accordingly, we envisaged that a mode of recruitment of Swi6 through binding to siRNAs to cognate sites in the dg-dh repeats shared among mating type, centromere and telomere loci could explain specific recruitment as well as inheritance following DNA replication. In accordance we framed the main questions as follows: i) Whether Swi6 binds specifically and with high affinity to the siRNAs and the cognate siRNA-DNA hybrids and whether the Swi63K-3A mutant is defective in this binding, ii) whether this lack of binding of Swi63K-3A affects its localization to heterochromatin, iii) whether the this specificity is validated by binding of Swi6 but not Swi63K-3A  to siRNAs and siRNA-DNA hybrids in vivo and iv) whether the binding mode was qualitatively and quantitatively different from that of Cen100 RNA or random RNAs, like GFP RNA.

      We think that our data provides answers to these lines of inquiry to support a model wherein the Swi6-siRNA mediated recruitment can explain a cis-controlled nucleation of heterochromatin at the cognate sites in the genome. We have also partially addressed the points raised by the study by Keller et al by invoking a dynamic balance between different modes of binding of Swi6 to different classes of RNA to exercise heterochromatin formation by Swi6 under normal conditions and RNA degradation under other conditions.

      While we aver about our hypothesis, we do acknowledge the need for more detailed investigation both to buttress our hypothesis and address the dynamics of siRNA binding and recruitment of Swi6  and how Swi6 functions fit in the context of other components of heterochromatin assembly, like the HDACs and Clr4 on one hand and exosome pathway on the other. Our future studies will attempt to address these issues.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript explores the RNA binding activities of the fission yeast Swi6 (HP1) protein and proposes a new role for Swi6 in RNAi-mediated heterochromatin establishment. The authors claim that Swi6 has a specific and high affinity for short interfering RNAs (siRNAs) and recruits the Clr4 (Suv39h) H3K9 methyltransferases to siRNA-DNA hybrids to initiate heterochromatin formation. These claims are not in any way supported by the incomplete and preliminary RNA binding or the in vivo experiments that the authors present. The proposed model also lacks any mechanistic basis as it remains unclear (and unexplored) how Swi6 might bind to specific small RNA sequences or RNA-DNA hybrids. Work by several other groups in the field has led to a model in which siRNAs produced by the RNAi pathway load onto the Ago1-containing RITS complex, which then binds to nascent transcripts at pericentromeric DNA repeats and recruits Clr4 to initiate heterochromatin formation. Swi6 facilitates this process by promoting the recruitment of the RNA-dependent RNA polymerase leading to siRNA amplification.

      Weaknesses:

      (1) a) The claims that Swi6 binds to specific small RNAs or to RNA-DNA hybrids are not supported by the evidence that the authors present. Their experiments do not rule out non-specific charged-based interactions.

      We disagree. We have used synthetic siRNAs of 20-22 nt length to do EMSA assay, as mentioned in the manuscript. Further, we have sequenced the small RNAs obtained after RIP experiments to validate the enrichment of siRNA in Swi6 bound fraction as compared to the mutant Swi6-bound fraction. These results are internally consistent regardless of the mode of binding. In any case the binding occurs primarily through the chromodomain although it is influenced by the hinge domain (see below).

      Furthermore, we have carried out EMSA experiments using Swi6 mutants carrying all three possible double mutations of the K residues in the KKK triplet and found that there was no difference in the binding pattern as compared to the wt Swi6: only the triple mutant “3K-3A” showed the effect. These results suggest that that the bdining is not completely dependent on the basic residues. These results will be included in the revised version.

      We also have some preliminary data from SAXS study showing that the CD of wt Swi6 shows a change in its structure upon binding to the siRNA, while the “3K-3A” mutant of Swi6 has a compact, folded structure that occludes the binding site of Swi6 in the chromodomain.” We propose to mention this preliminary finding in the revised version as unpublished data.

      b) Claims about different affinities of Swi6 for RNAs of different sizes are based on a comparison of KD values derived by the authors for a handful of S. pombe siRNAs with previous studies from the Buhler lab on Swi6 RNA binding. The authors need to compare binding affinities under identical conditions in their assays.

      Thus, the EMSA data do suggest sequence specificity in binding of Swi6 to specific siRNA sequences (Figure S5) and implies specific residues in Swi6 being responsible for that. Thus, Identification of the residues in Swi6 involved in siRNA binding in the CD would definitely be interesting, as also the experimental confirmation of the consensus siRNA sequence. It may however be noted that as against the binding of Swi6 to siRNAs occurs through CD, that of Cen100 or GFP RNA was shown be through the hinge domain by Keller et al.

      The estimation of Kd by the Buhler group was based on NMR study, which we are not in a position to perform in the near future. Nonetheless, we did carry out EMSA study using the ‘Cen100’ RNA, same as the one used by the Keller et al study. Surprisingly, in contrast with the result of EMSA in agarose gel showing binding of Swi6 to “Cen100” RNA as reported by Keller et al, we fail to observe any binding in EMSA done in acrylamide gel. (The same is true of the RevCen 100). While this raises issues of why the Keller et al chose to do EMSA in agarose gel instead of the conventional approach of using acrylamide gel, it does lend support to our claim of stronger binding of Swi6 to siRNAs. Another relevant observation of binding of Swi6 to the “RevCen” RNA precursor RNAs but a detectable binding to siRNAs denoted as VI-IX (as measured by competition experiments, that are derived from RevCen RNA; Figure S4 and S7), which are derived by Dcr1 cleavage of the ‘’RevCen’’ RNA.

      We also disagree that we carried out EMSA with a small bunch of siRNAs. As indicated in Figure 1 and S1, we synthesized nearly 12 siRNAs representing the dg-dh repeats at Cen, mat and tel loci and measured their specificity of binding to Swi6 using EMSA assay by labeling the ones labelled “D”, “E” and “V” directly and those of the remaining ones by the latter’s ability to compete against the binding (Figure 1, S4). These results point to presence of a consensus sequence in siRNAs that shows highly specific and strong binding to Swi6 in the low micromolar range.

      Further, our claim of binding of Swi6 and not Swi63K>3A to siRNA in vivo is validated by RIP experiments, as shown in Fig 2 and S9.

      c) The regions of Swi6 that bind to siRNAs need to be identified and evidence must be provided that Swi6 binds to RNAs of a specific length, 20-22 mers, to support the claim that Swi6 binds to siRNAs. This is critical for all the subsequent experiments and claims in the study.

      We have provided both in vitro data, which is va;idiated in vivo by RIP experiments, as mentioned above. However, we agree that it wpuld be very interesting to identify the residues in Swi6 chromdomain responsible for binding to siRNA. However, such an investigation is beyond the scope of the present study.

      (2) a) The in vivo results do not validate Swi6 binding to specific RNAs, as stated by the authors. Swi6 pulldowns have been shown to be enriched for all heterochromatic proteins including the RITS complex. The sRNA binding observed by the authors is therefore likely to be mediated by Ago1/RITS.

      We disagree with the first comment. Our RIP experiments do validate the in vitro results (Fig 1, 2, S4 and S9), as argued above. The observation alluded to by the reviewer “Swi6 pulldowns have been shown to be enriched for all heterochromatic proteins including the RITS complex” is not inconsistent with our observation; it is possible that the siRNA may be released from the RITS complex and transferred to Swi6, possibly due to its higher affinity.

      Thus, we would like to suggest that the role of Swi6 is likely to be coincidental or subsequent to that of Ago1/RITS (see below). We think that the binding by Swi6 to the siRNA and siRNA-DNA hybrid and could be also carried out in cis at the level of siRNA-DNA hybrids.

      This point needs to be addressed in future studies.

      b) Most of the binding in Figure S8C seems to be non-specific.

      We would like to point out that the result in Figure S8C needs to be examined together with the Figure S8B, which shows RNA bound by Swi6 but not Swi63K-3A to hybridize with dg, dh and dh-k probes.

      c) In Figure S8D, the authors' data shows that Swi6 deletion does not derepress the rev dh transcript while dcr1 delete cells do, which is consistent with previous reports but does not relate to the authors' conclusions.

      The purpose of results shown in Figure S8D is just to compare the results of Swi6 with that of Swi63K-3A.

      d) Previous results have shown that swi6 delete cells have 20-fold fewer dg and dh siRNAs than swi6+ cells due to decreased RNA-dependent RNA polymerase complex recruitment and reduced siRNA amplification.

      This result is consistent with our results invoking a role of Swi6 in binding to, protecting and recruiting siRNAs to homologous sites.

      To find if the overall production of siRNA is compromised in swi6 3K->3A mutant, we i) calculated the RIP-Seq read counts for swi6 3K->3A , swi6+ and vector control in 200 bp genomic bins , ii) divided the Swi6 3K->3A and swi6+ signals by that of control, iii) removed the background using the criteria of signal value < 25% of max signal, and iv) counted the total reads (in excess to control) in all peak regions in both samples.  This revealed a total count of 10878 and 8994 respectively for Swi6 3K->3A  and swi6+ samples, possibly implying that the overall siRNA production is not compromised in the Swi6 3K->3A mutant.

      (3) a) The RIP-seq data are difficult to interpret as presented. The size distribution of bound small RNAs, and where they map along the genome should be shown as for example presented in previous Ago1 sRNA-seq experiments.

      Please see the response to 2(d).

      b) It is also unclear whether the defects in sRNA binding observed by the authors represent direct sRNA binding to Swi6 or co-precipitation of Ago1-bound sRNAs.

      The correspondence between our in vivo and in vitro results suggests that the binding to Swi6 would be direct. We do not observe a complete correspondence between the Swi6- and Ago-bound siRNAs. We think Swi6 binding may be coincident with or following RITS complex formation.

      This point will be discussed in the Revision.

      The authors should also sequence total sRNAs to test whether Swi6-3A affects sRNA synthesis, as is the case in swi6 delete cells.

      Please see response to 2(d) above.

      (4) The authors examine the effects of Swi6-3A mutant by overexpression from the strong nmt1 promoter. Heterochromatin formation is sensitive to the dosage of Swi6. These experiments should be performed by introducing the 3A mutations at the endogenous Swi6 locus and effects on Swi6 protein levels should be tested.

      Although we agree, we think that the heterochromatin formation is occurring in presence of nmt1-driven Swi6 but not Swi63K>3A, as indicated by the phenotype and Swi6 enrichment at otr1R::ade6, imr1::ura4 and his3-telo (Figure 3) and mating type (Fig. S10). Furthermore, the both GFP-Swi6 and GFPSwi63K>3A are expressed at similar level (Fig. S8A).

      (5) The authors' data indicate an impairment of silencing in Swi6-3A mutant cells but whether this is due to a general lower affinity for nucleosomes, DNA, RNA, or as claimed by the authors, siRNAs is unclear. These experiments are consistent with previous findings suggesting an important role for basic residues in the HP1 hinge region in gene silencing but do not reveal how the hinge region enhances silencing.

      Our study aims to correlate the binding of Swi6 but not Swi63K-3A to siRNA with its localization to heterochromatin. A similar difference in binding of Swi6 but not Swi63K-3A to siRNA-DNA hybrid, together with sensitivity of silencing and Swi6 localization to heterochromatin to RNaseH support the above correlations as being causally connected.

      In terms of mechanism of binding, we need to clarify that the primary mode of binding is through the CD and not the hinge domain, although the hinge domain does influence this binding. This result is different from those of Keller et al.

      We have some structural data based on preliminary SAXS experiment supporting binding of siRNA to the CD and influence of the hinge domain on this binding. However, this line of investigation need to be extended and will be subject of future investigations.

      (6) RNase H1 overexpression may affect Swi6 localization and silencing indirectly as it would lead to a general reduction in R loops and RNA-DNA hybrids across the genome. RNaseH1 OE may also release chromatin-bound RNAs that act as scaffolds for siRNA-Ag1/RITS complexes that recruit Clr4 and ultimately Swi6.

      These are formal possibilities. However, the correlation between swi6 binding to siRNA-DNA hybrid and delocalization upon RNase H1 treatment argues for a more direct link.

      (7) Examples of inaccurate presentation of the literature.

      a) The authors state that "RNA binding by the murine HP1 through its hinge domains is required for heterochromatin assembly (Muchardt et al, 2002). The cited reference provides no evidence that HP1 RNA binding is required for heterochromatin assembly. Only the hinge region of bacterially produced HP1 contributes to its localization to DAPI-stained heterochromatic regions in fixed NIH 3T3 cells.

      Noted. Statement will be corrected.

      b) "... This scenario is consistent with the loss of heterochromatin recruitment of Swi6 as well as siRNA generation in rnai mutants (Volpe et al, 2002)." Volpe et al. did not examine changes in siRNA levels in swi6 mutant cells. In fact, no siRNA analysis of any kind was reported in Volpe et al., 2002.

      Correct.  We only say that Swi6 recruitment is reduced in rnai mutants and correlate it with ability of SWi6 to bind to siRNA generated by RNAi and subsequently to siRNA-DNA hybrid.

      Reviewer #2 (Public review):

      The aim of this study is to investigate the role of Swi6 binding to RNA in heterochromatin assembly in fission yeast. Using in vitro protein-RNA binding assays (EMSA) they showed that Swi6/HP1 binds centromere-derived siRNA (identified by Reinhardt and Bartel in 2002) via the chromodomain and hinge domains. They demonstrate that this binding is regulated by a lysine triplet in the conserved region of the Swi6 hinge domain and that wild-type Swi6 favours binding to DNA-RNA hybrids and siRNA, which then facilitates, rather than competes with, binding to H3K9me2 and to a lesser extent H3K9me3.

      However, the majority of the experiments are carried out in swi6 null cells overexpressing wild-type Swi6 or Swi63K-3A mutant from a very strong promoter (nmt1). Both swi6 null cells and overexpression of Swi6 are well known to exhibit phenotypes, some of which interfere with heterochromatin assembly. This is not made clear in the text.

      We think that the argument is not valid as we show that swi6 but not Swi63K-3A could restore silencing at imr1::ura4, otr1::ade6 and his3-telo (Fig 3) and mating type (Fig. S10), when transformed into a swi6D strain.

      Whilst the RNA binding experiments show that Swi6 can indeed bind RNA and that binding is decreased by Swi63K-3A mutation in vitro (confusingly, they only much later in the text explained that these 3 bands represent differential binding and that II is likely an isotherm). The gels showing these data are of poor quality and it is unclear which bands are used to calculate the Kd.

      We disagree with the comment about the quality of EMSA data. We think it is of similar quality or better than that of Keller et al, except in some cases, like Fig 1D, a shorter exposure shown to distinguish the slowest shifted band has caused the remaining bands to look fainter.

      RNA-seq data shows that overall fewer siRNAs are produced from regions of heterochromatin in the Swi63K-3A mutant so it is unsurprising that analysis of siRNA-associated motifs also shows lower enrichment (or indeed that they share some similarities, given that they originate from repeat regions).

      Please see response to comment 2(d) of the first reviewer above.

      It is not clear which bands are being alluded to. However, we‘ll rectify any gaps in information in the revision.

      The experiments are seemingly linked yet fail to substantiate their overall conclusions. For instance, the authors show that the Swi63K-3A mutant displays reduced siRNA binding in vitro (Figure 1D) and that H3K9me2 levels at heterochromatin loci are reduced in vivo (Figure 3C-D). They conclude that Swi6 siRNA binding is important for Swi6 heterochromatin localization, whilst it remains entirely possible that heterochromatin integrity is impaired by the Swi63K-3A mutation and hence fewer siRNAs are produced and available to bind. Their interpretation of the data is really confusing.

      Our argument is that the lack of binding by Swi63K>3A to siRNA can explain the loss of recruitment to heterochromatin loci and thus affect the integrity of heterochroamtin; the recruitment of Swi6 can occur possibly by binding initially to siRNA and thereafter as siRNA-DNA hybrid. However, the overall level of siRNAs is not affected, as in 2(D) above. This interpretation is supported by results of ChIP assay and confocal experiments, as also by the effect of RNaseH1 in the recruitment of Swi6.

      The authors go on to show that Swi63K-3A cells have impaired silencing at all regions tested and the mutant protein itself has less association with regions of heterochromatin. They perform DNA-RNA hybrid IPs and show that Swi63K-3A cells which also overexpress RNAseH/rnh1 have reduced levels of dh DNA-RNA hybrids than wild-type Swi6 cells. They interpret this to mean that Swi6 binds and protects DNA-RNA hybrids, presumably to facilitate binding to H3K9me2. The final piece of data is an EMSA assay showing that "high-affinity binding of Swi6 to a dg-dh specific RNA/DNA hybrid facilitates the binding to Me2-K9-H3 rather than competing against it." This EMSA gel shown is of very poor quality, and this casts doubt on their overall conclusion.

      We do agree with the reviewer about the quality of EMSA (Fig. 5B). However, as may be noticed in the EMSA for siRNA-DNA hybrid binding  (Fig 4A), the bands of Swi6-bound siRNA-DNA hybrid are extremely retarded. Hence the EMSA for subsequent binding by H3-K9-Me peptides required a longer electrophoretic run, which led to reduction in the sharpness of the bands. Nevertheless, the data does indicate binding efficiency in the order H3K9-Me2> H3-K9-Me3 > H3-K9-Me0. Having said that, we plan to repeat the EMSA or address the question by other methods, like SPR.

      Unfortunately, the manuscript is generally poorly written and difficult to comprehend. The experimental setups and interpretations of the data are not fully explained, or, are explained in the wrong order leading to a lack of clarity. An example of this is the reasoning behind the use of the cid14 mutant which is not explained until the discussion of Figure 5C, but it is utilised at the outset in Figure 5A.

      We tend to agree somewhat and will attempt to submit a revised version with greater clarity, as also the explanation of experiment with cid14D strain.

      Another example of this lack of clarity/confusion is that the abstract states "Here we provide evidence in support of RNAi-independent recruitment of Swi6". Yet it then states "We show that...Swi6/HP1 displays a hierarchy of increasing binding affinity through its chromodomain to the siRNAs corresponding to specific dg-dh repeats, and even stronger binding to the cognate siRNA-DNA hybrids than to the siRNA precursors or general RNAs." RNAi is required to produce siRNAs, so their message is very unclear. Moreover, an entire section is titled "Heterochromatin recruitment of Swi6-HP1 depends on siRNA generation" so what is the author's message?

      The reviewer has correctly pointed out the error. Indeed, our results actually indicate an RNAi-dependent rather than independent mode of recruitment. Rather, we would like to suggest an H3-K9-Me2-indpendnet recruitment of Swi6. We will rectify this error in our revised manuscript.

      The data presented, whilst sound in some parts is generally overinterpreted and does not fully support the author's confusing conclusions. The authors essentially characterise an overexpressed Swi6 mutant protein with a few other experiments on the side, that do not entirely support their conclusions. They make the point several times that the KD for their binding experiments is far higher than that previously reported (Keller et al Mol Cell 2012) but unfortunately the data provided here are of an inferior quality and thus their conclusions are neither fully supported nor convincing.

      We have used the method of Heffler et al (2012) to compute the Kd from EMSA data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      (1) This work investigates numerically the propagation of subthreshold waves in a model neural network that is derived from the C. elegans connectome. Using a scattering formalism and tight-binding description of the network -- approximations which are commonplace in condensed matter physics -- this work attempts to show the relevance of interference phenomena, such as wavenumber-dependent propagation, for the dynamics of subthreshold waves propagating in a network of electrical synapses.

      (2) The primary strength of the work is in trying to use theoretical tools from a far-away corner of fundamental physics to shed light on the properties of a real neural system. While a system composed of neurons and synapses is classical in nature, there are occasions in which interference or localization effects are useful for understanding wave propagation in complex media [review, van Rossum & Nieuwenhuizen, 1999]. However, it is expected that localization effects only have an impact in some parameter regimes and with low phase dissipation. The authors should have addressed the existence of this validity regime in detail prior to assuming that interference effects are important.

      The theoretical concept and tool used in this study are not situated in a far-away corner of fundamental physics but hold one of the central positions in condensed matter physics and statistical physics. In fact, the non-scientific statement about where the theoretical concept and tool employed by the researchers are positioned within the realm of fundamental physics is irrelevant. The fundamental physics governs the foundations of all natural phenomena, and thus it provides indispensable principles for interpreting not only neural systems but also all life phenomena. One such principle explored in our study is the interference and localization of waves.

      Specifically, in the third paragraph of the Introduction, we introduced that the interference effect of subthreshold oscillating waves, beyond being a theoretical possibility, is a phenomenon actually observed in neural tissue (Chiang and Durand, 2023; Gupta et al., 2016). Moreover, according to Devor and Yarom (2002), the propagation of subthreshold oscillations observed in the inferior olivary nucleus extended beyond a distance of 0.2 mm. Therefore, considering the propagation of subthreshold waves and the resulting interference in the connectome of C. elegans, which has a total body length of less than 1 mm, a diameter of about 0.08 mm, and most neurons distributed in the ring structure near its neck, provides sufficient validity for the initiation of theoretical and computational studies.

      The primary objective of our study is to investigate which regimes of signal transmission/localization and interference phenomena are valid within the network of electrical synapses in C. elegans, the only system for which the neural connectome structure is perfectly known. As the Reviewer rightly pointed out in the question, this is exactly the issue that the Reviewer is curious about. Therefore, the existence of this validity regime cannot be addressed prior to conducting the study but can only be identified as a result of performing the research. And we have conducted such a study.

      (3) An additional approximation that was made without adequate justification is the use of a tight-binding Hamiltonian. This can be a reasonable approximation, even for classical waves, in particular in the presence of high-quality-factor resonators, where most of the wave amplitude is concentrated on the nodes of the network, and nodes are coupled evanescently with each other. Neither of these conditions were verified for this study.

      The tight-binding Anderson Hamiltonian we used in this study originally consisted of the on-site energy at each node and the hopping matrix between nodes. When the on-site energy is relatively much more stable (i.e., has a large negative value) compared to the hopping matrix, most of the wave amplitude becomes concentrated on the nodes as the Reviewer mentioned. However, as is well-known from reference papers (Anderson, 1958; Chang et al., 1995; Meir et al., 1989; Shapir et al., 1982; Thomas and Nakanishi, 2016), in this study, we also removed the on-site energy to prevent the waves from being concentrated on the nodes. Therefore, the tight-binding Hamiltonian we used in this study ensures that waves propagate through edges in the network where the values of the hopping matrix exist.

      To assist the Reviewer in better understanding the model used in this study, we provide additional explanations as follows. In the manuscript, we have already provided detailed descriptions of the setup using the tight-binding Anderson Hamiltonian in the Method section under “Construction of our circuit model” and the explanation of Figure 1. In the model we used, the edges represented by solid lines are perfect conductors, while the dotted lines representing gap junctions act as potential barriers (Fig. 1B). Therefore, when electric signals propagate, we are dealing with the phenomenon where signals transmitted through the edges encounter potential barriers, causing scattering or attenuation. The model described by the Reviewer is indeed a commonly used model in condensed matter physics, but we did not use the exact model mentioned by the Reviewer. Instead, as is common in well-known reference papers, we modified it to suit our purposes. We hope this explanation helps the Reviewer gain a better understanding.

      (4) The motivation for this work is to understand the basic mechanisms underlying subthreshold intrinsic oscillations in the inferior olive, but detailed connectivity patterns in this brain area are not available. The connectome is known for C elegans, but sub-threshold oscillations have not been observed there, and the implications of this work for C elegans neuroscience remain unclear. The authors should also give more evidence for the claim that their study may give a mechanism for synchronized rhythmic activity in the mammalian inferior olive nucleus, or refrain from making this conclusion.

      We agree with the Reviewer's point. In this study, we do not provide additional analysis on the mammalian inferior olive nucleus beyond what is already known from previous research. What we intended to discuss in the Discussion section was to suggest that within our model, there is a “possibility” that a group of cells exchanging wave signals of a specific wavenumber with high transmittance may show synchronized rhythmic activity. Therefore, to avoid any misunderstanding for the reader, we have revised the corresponding sentence in the Discussion as follows.

      In the Discussion, “The plausible possibility according to our model study is that the constructive interference of subthreshold membrane potential waves with a specific wavenumber may generate the synchronized rhythmic activation.

      (5) In the same vein, since the work emphasizes the dependence on the wavenumber for the propagation of subthreshold oscillations, they should make an attempt at estimating the wavenumber of subthreshold oscillations in C elegans if they were to exist and be observed. Next, the presence of two "mobility edges" in the transmission coefficient calculated in this work is unmistakably due to the discrete nature of the system, coming from the tight-binding approximation, and it is unclear if this approximation is justified in the current system.

      In this study, we modeled the propagation of subthreshold waves on the electrical synapse network of C. elegans, but we did not explain the generation of subthreshold oscillations themselves. Here, we simply injected wave signals with various wavenumber values into the network using a hypothetical device called an "Injector." As the Reviewer pointed out, estimating the wavenumbers of subthreshold oscillations that may exist or be observed in C. elegans would require a comprehensive investigation of the membrane potential dynamics occurring in the membranes of individual neurons. However, this is beyond the scope of this study and would require considerable effort to accomplish.

      As for the use of the tight-binding Hamiltonian, we have addressed that in our response to the third paragraph in the Joint Public Review above.

      (6) Similarly, it is possible that the wavenumber-dependent transmission observed depends strongly on the addition of a large number of virtual nodes (VNs) in the network, which the authors give little to no motivation for. As these nodes are not present in the C elegans connectome, the authors should explain the motivation for their inclusion in the model and should discuss their consequences on the transmission properties of the network.

      As mentioned in our response to the third paragraph in the Joint Public Review above, in our model, a node is simply a pathway for waves to pass through. Therefore, inserting virtual nodes between two neurons that are connected in the C. elegans connectome does not alter the actual connection structure. In other words, virtual nodes do not create new connections between cells that didn’t exist in the connectome. The virtual nodes we introduced are merely a way to divide the sections—axon, gap junction, dendrite—through which the wave passes when it is transmitted between two neurons. As we have already explained in Fig. 1B, the edge connected by two virtual nodes, represented by a dotted line, is motivated to depict the gap junction acting as a potential barrier. We hope this explanation helps the Reviewer better understand the model used in this study.

      (7) As it stands, the work would only have a very limited impact on the understanding of subthreshold oscillations in the rat or in C elegans. Indeed, the preprint falls short of relating its numerical results to any phenomena which could be observed in the lab.

      In this study, we proposed a minimalistic model built using the currently available but limited C. elegans connectome information. Specifically, our model is not a phenomenological one that adjusts parameters to accurately predict experimental measurements, but rather an attempt at a novel conceptual approach to theoretically possible scenarios. While the model may not be satisfactory enough to explain experimental phenomena at present, it is a theoretical/computational study that someone needs to undertake. We believe this is the path of scientific progress. Therefore, as the Reviewer has expressed concern, it is entirely understandable that reproducing the numerical results measured in actual experiments is difficult in this study. Nevertheless, we believe that this study makes a basic contribution to the conceptual understanding of subthreshold signal propagation in C. elegans’ electric synapses.

      Rather than offering a stretched opinion, we maintain a positive hope that future researchers in this field will improve the model by incorporating more detailed and extensive biological data through follow-up studies, allowing us to get closer to describing real phenomena.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The word "Sensory" was misspelled in Figures 2, 4 and 5.

      We appreciate the feedback from Reviewer #1. We have corrected the mentioned typos in Figures 2, 4, and 5 of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      What neurophysiological changes support the learning of new sensorimotor transformations is a key question in neuroscience. Many studies have attempted to answer this question at the neuronal population level - with varying degrees of success - but few, if any, have studied the change in activity of the apical dendrites of layer 5 cortical neurons. Neurons in layer 5 of the sensory cortex appear to play a key role in sensorimotor transformations, showing important decision and reward-related signals, and being the main source of cortical and subcortical projections from the cortex. In particular, pyramidal track (PT) neurons project directly to subcortical regions related to motor activity, such as the striatum and brainstem, and could initiate rapid motor action in response to given sensory inputs. Additionally, layer 5 cortical neurons have large apical dendrites that extend to layer 1 where different neuromodulatory and long-range inputs converge, providing motor and contextual information that could be used to modulate layer 5 neurons output and/or to establish the synaptic plasticity required for learning a new association. 

      In this study, the authors aimed to test whether the learning of a new sensorimotor transformation could be supported by a change in the evoked response of the apical dendrites of layer 5 neurons in the mouse whisker primary somatosensory cortex. To do this, they performed longitudinal functional calcium imaging of the apical dendrites of layer 5 neurons while mice learned to discriminate between two multi-whisker stimuli. The authors used a simple conditioning task in which one whisker stimulus (upward or backward air pu , CS+) is associated with a reward after a short delay, while the other whisker stimulus (CS-) is not. They found that task learning (measured by the probability of anticipatory licking just after the CS+) was not associated with a significant change in the average population response evoked by the CS+ or the CS-, nor a change in the average population selectivity. However, when considering individual dendritic tufts, they found interesting changes in selectivity, with approximately equal numbers of dendrites becoming more selective for CS+ and dendrites becoming more selective for CS-. 

      One of the major challenges when assessing changes in neural representation during the learning of such Go/NoGo tasks is that the movements and rewards themselves may elicit strong neural responses that may be a confounding factor, that is, inexperienced mice do not lick in response to the CS+, while trained mice do. In this study, the authors addressed this issue in three ways: first, they carefully monitored the orofacial movements of mice and showed that task learning is not associated with changes in evoked whisker movements. Second, they show that whisking or licking evokes very little activity in the dendritic tufts compared to whisker stimuli (CS+ and CS-). Finally, the authors introduced into the design of their task a post-conditioning session after the last conditioning session during which the CS+ and the CS- are presented but no reward is delivered. During this post-session, the mice gradually stopped licking in response to the CS+. A better design might have been to perform the pre-conditioning and post-conditioning sessions in nonwater-restricted, unmotivated mice to completely exclude any lick response, but the fact that the change in selectivity persists after the mice stopped licking in the last blocks of the post-conditioning session (in mice relying only on their whiskers to perform the task) is convincing. 

      The clever task design and careful data analysis provide compelling evidence that learning this whisker discrimination task does not result in a massive change in sensory representation in the apical dendritic tufts of layer 5 neurons in the primary somatosensory cortex on average. Nevertheless, individual dendritic tufts do increase their selectivity for one or the other sensory stimulus, likely enhancing the ability of S1 neurons to accurately discriminate the two stimuli and trigger the appropriate motor response (to lick or not to lick). 

      One limitation of the present study is the lack of evidence for the necessity of the primary somatosensory cortex in the learning and execution of the task. As the authors have strongly emphasized in their previous publications, the primary somatosensory cortex may not be necessary for the learning and execution of simple whisker detection tasks, especially when the stimulus is very salient. Although this new task requires the discrimination between two whisker stimuli, the simplicity and salience of the whisker stimuli used could make this task cortex-independent. Especially when considering that some mice seem to not rely entirely on their whiskers to execute the task. 

      Nevertheless, this is an important result that shows for the first time changes in the selectivity to sensory stimuli at the level of individual apical dendritic tufts in correlation with the learning of a discrimination task. This study sheds new light on the cortical cellular substrates of reward-based learning and opens interesting perspectives for future research in this area. In future studies, it will be important to determine whether the change in selectivity of dendritic calcium spikes is causally involved in the learning of the task or whether it simply correlates with learning, as a consequence of changes in synaptic inputs caused by reward. The dendritic calcium spikes may be involved in the establishment of synaptic plasticity required for learning and impact the output of layer 5 pyramidal neurons to trigger the appropriate motor response. It would be important also to study the changes in selectivity in the apical dendrite of the identified projection neurons.  

      Reviewer #2 (Public Review):

      Summary: 

      The authors did not find an increased representation of CS+ throughout reinforcement learning in the tuft dendrites of Rbp4-positive neurons from layer 5B of the barrel cortex, as previously reported for soma from layer 2/3 of the visual cortex. 

      Alternatively, the authors observed an increased selectivity to both stimuli (CS+ and CS-) during reinforcement learning. This feature: 

      (1) was not present in repeated exposures (without reinforcement), 

      (2) was not explained by the animal's behaviour (choice, licking, and whisking), and 

      (3) was long-lasting, being present even when the mice disengaged from the task. 

      Importantly, increased selectivity was correlated with learning (% correct choices), and neural discriminability between stimuli increased with learning. 

      In conclusion, the authors show that tuft dendrites from layer 5B of the barrel cortex increase the representation of conditioned (CS+) and unconditioned stimuli (CS-) applied to the whiskers, during reinforcement learning. 

      Strengths: 

      The results presented are very consistent throughout the entire study, and therefore very convincing: 

      (1) The results observed are very similar using two different imaging techniques (2-photon planar imaging- and SCAPE-volumetric imaging). Figure 3 and Figure 4 respectively. 

      (2) The results are similar using "different groups" of tuft dendrites for the analysis (e.g.

      initially unresponsive and responsive pre- and post-learning). Figure 5. 

      (3) The results are similar from a specific set of trials (with the same sensory input, but di erent choices). Figure 7. 

      (4) Additionally, the selectivity of tuft dendrites from layer 5B of the barrel cortex was higher in the mice that exclusively used the whisker to respond to the stimuli (CS+ and CS-).  The results presented are controlled against a group of mice that received the same stimuli presentation, except for the reinforcement (reward). 

      Additionally, the behaviour outputs, such as choice, whisking, and licking could not account for the results observed. 

      Although there are no causal experiments, the correlation between selectivity and learning (percentage of correct choices), as well as the increased neural discriminability with learning, but not in repeated exposure, are very convincing. 

      Weaknesses: 

      The biggest weakness is the absence of causality experiments. Although inhibiting specifically tuft dendritic activity in layer 1 from layer 5 pyramidal neurons is very challenging, tuft dendritic activity in layer 1 could be silenced through optogenetic experiments as in Abs et al. 2018. By manipulating NDNF-positive neurons the authors could specifically modify tuft dendritic activity in the barrel cortex during CS presentations, and test if silencing tuft dendritic activity in layer 1 would lead to the lack of selectivity and an impairment of reinforcement learning. Additionally, this experiment will test if the selectivity observed during reinforcement learning is due to changes in the local network, namely changes in local synaptic connectivity, or solely due to changes in the long-range inputs.    

      We agree that such causal manipulations are a logical next step. Such manipulations are unfortunately not specific to layer 5 apicals, so the results would be difficult to interpret. We now discuss the challenge of such manipulations in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, the study is solid and the article is well and clearly written. I have no suggestion for other experiments that would fall within the scope of this article. I would like only to suggest some additional analyses and clarifications in the writing. 

      Additional analyses: 

      Obviously, the main confounding factor in this type of data comes from the acquired motor response which follows - with a short latency - the sensory stimulus. This is particularly problematic for functional calcium imaging which has very low temporal resolution. The authors have addressed this question to some extent by showing that motor-evoked activity does not account for the change in selectivity acquired with learning and through the use of a post-conditioning session during which no reward was delivered. Figures 8C-D show that mice gradually stop licking in response to CS+ in this session and that the distribution of the selectivity index remains similar in these last blocks. Perhaps a more convincing analysis would be to simply select Miss and Correct rejection trials in which mice did not lick in response to the CS+ and CS-, respectively. Ideally, if the number of trials is sufficient, one could even select trials devoid of any evoked movement (no licking and no whisking).  

      We agree it would be interesting to compare Miss and Correct rejection trials to further rule out effects of a motor response, but there were never enough Miss trials to conduct such an analysis. Even in very early learning, there are few Miss trials (see Figure 1, session 2). We found that in early learning, animals would lick in most trials. Then, over the course of conditioning, they would learn to withhold licks during CS- presentation. Thus, we were able to examine Hits, Correct rejections, and False alarms (Figure 7), but not Miss trials. We have added text suggesting a future experiment in which the stimulus strengths are substantially reduced to drastically increase the error rates.

      The fact that changes in selectivity occur in both directions overall is really interesting. However, in the way the data are presented currently, one may wonder about mice/field of view vs single cell effect. i.e., do di erent dendritic tufts in the same field of view show opposite changes in selectivity? If we were to replot Figure 3A for a single mouse, would we obtain the same picture?  

      We appreciate this very good suggestion and have added scatter plots and selectivity index histograms for individual conditioned animals in Supplementary figure 2. These data demonstrate that different dendritic tufts in the same field of view exhibit opposite changes in selectivity.

      The authors point out that they observed no change in the mean response or selectivity during learning, but did find changes in selectivity at the level of individual dendritic tufts. This suggests that, at the population level, the ability to discriminate between the two stimuli should improve. A possible complementary analysis would be to show that the ability to decode stimulus identity from dendritic tuft population activity increases with learning.  

      Given the substantial change in individual tuft selectivity and that the tuft events occur are not rare, the population result is guaranteed. If individual tufts increase selectivity, the population will also increase its selectivity on a trial-by-trial basis. We have nevertheless included a new supplementary figure with a population analysis using SVMs to demonstrate this.

      Clarification: 

      The authors should make it clear from the beginning that mice are still water-restricted during the post-conditioning session and actually do keep licking for many CS+ trials. Therefore, this session is not devoid of motor response. 

      We have clarified this in the text.

      Did mice in the repeated exposure condition receive any reward during the recording sessions? If so when were rewards delivered? 

      We previously described in the Methods that these mice received water in their home cage, but we now additionally clarify this in the Results section.

      Minor: 

      Figure 2Aii, the labels of the Alpha and Betta barrels should be swapped. 

      Fixed

      Line 218: I believe this sentence should read "Using SCAPE microscopy, ...". 

      Corrected.

      Line 665: 'Reconstruction from 50' does that refer to the single cell reconstruction on the left panel? 

      Yes – Clarified in legend

      Reviewer #2 (Recommendations For The Authors): 

      Minor suggestions: 

      The 'summary' should mention from which brain area the results were acquired. Otherwise, it is misleading, giving the idea that the results described a generic feature, which is still unknown.  

      Added to the text.

      Please correct sentence 219: "SCAPE microscopy, we image tuft activity of additional mice..." 

      Added to the text.

      In the same sentence (219) it would be good to provide the number of additional mice imaged (2). 

      Added to the text.

      Regarding Supplementary Figure 1, it would be interesting to correlate the second peak after reward and learning rate, to provide further support to the sentences 109 to 113. 

      We agree this would be interesting to examine, but only four animals exhibited this second peak, which is too small of a sample to observe a meaningful correlation. We now clarify this in the text.

      In Figure 3, why not present the correlation between 'neural discriminability' and % of correct choices? 

      We appreciate the suggestion and have added this plot to Figure 3.

      The 'results' section will benefit tremendously if the authors consistently indicate the figures to which the results are being described, or 'data not shown' if it is the case. To give a few examples: 

      Sentence 108 - "averaged 28% ΔF/F" - From which figure is this result coming from?  Sentence 123 - "(p = 0.62, 0.64, respectively)" - comparison not shown, but see Figures 2E and D respectively? 

      Sentence 125 - "(CS+ responsive (...) across all sessions)" - From which figure is this result coming from? 

      Sentence 130 - "during pre-conditioning (p=0.66) or post-conditioning sessions (p=0.44) - From which figure? 

      Sentence 154 - "(Pre: p=0.20; last rewarded: p=0.43; Post: p=0.64, sign-rank test)" - From which figure? 

      Sentence 175 - "(-0.049, -0.001, and 0.003" - From which figure? Please show the graph that shows that the mean SI is not different. It can be supplementary. The distribution of SI will be strengthened by it.  

      We added this plot to supplementary figure 2.

      Sentence 244 - "(conditioned: 458/603; repeated exposure: 334/457) - From Figure 5E. 

      Sentence 256 - "(p=0.04, 2-sample t-test comparison mice) - From Figure 5B.  Sentence 258 - "(p=0.03, paired t-test) - from Figure 5B  Sentences 370 to 378 - No reference to the figure. 

      The 'discussion' section (sentences 459 to 494) refers to the differences between the current and previous studies (references 1,3,5), namely soma vs. dendrites and layer 2/3 vs. layer 5. However, it should also mention the difference between the nature of the stimuli and the brain area recorded (visual cortex vs. barrel cortex).

      We have addressed these issues in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Authors reject the substance of Reviewer 1’s feedback primarily due to clear lack of understanding of typical parameterization practices used to avoid overfitting. To ensure the Spearman-rank correlation accuracy, 70% of all data was withheld from the optimization process and used solely for testing to yield figure 6. Data was withheld prior to model parameterization and therefore avoids Reviewer 1’s charge of “artificially forcing the correlation”. Authors did appreciate the request for clarification of additional definitions and minor reorganization suggestions. Below we provide specific responses to each numbered point (note: multiple responses are provided for some of the reviewer points).

      Point 1: Clarify Metrics Definition and Evaluation

      Authors clarified the description of biodiversity metrics. The metrics associated with manual methods are detailed in the third paragraph of the Materials and Methods: Data Analysis section, while the sensor-based metric is described in the second paragraph, and summarized in its last sentence.

      Text Additions:

      Authors added clarification to the introduction’s first paragraph defining biodiversity metrics, including species richness.

      Authors added detailed definitions of community metrics and their significance in community ecology in the Materials and Methods section (3rd paragraph of “Data Analysis” section). The discussion was updated to include a reference to community ecology and the benefits of big data, specifically highlighting the potential of autonomous optical sensors in entomology.

      Methods Reorganization

      We have reorganized the Methods section for clarity. Updated section clarifies metrics studied, location, dates, a description and methods around optical sensors, Malaise traps, and sweep netting.

      Text Additions:

      An overview paragraph was added to “Data analysis” (3rd paragraph) detailing key metrics used, specifying metrics such as abundance, richness, Shannon index, and Simpson index.

      Visualization methods for sensor data to deliver analogous metrics of abundance, richness, and diversity indices was added to “Data analysis” section.

      Supplementary Table 1 and the first paragraph of the Materials and Methods section cover location, dates, and other general information.

      Detailed descriptions and methods for optical sensors, Malaise traps, and sweeping are provided.

      Integration of Metrics

      Authors integrated two paragraphs explaining the fundamental differences between conventional methods in the 3rd paragraph of the discussion and the presented method of biodiversity measurement.

      Point 2: Body-to-Wing Ratio Calculation

      The backscattered optical cross-section is now clearly defined as the value measured at the maximum point of the event. Specifically, we have added the word ‘maximum’ to our methods section for clarity.

      Point 3: Ecosystem Services Paragraph

      We have shortened and edited this paragraph for clarity. The revised text is now more straightforward and comprehensible.

      Point 4: Results Section Structure

      We believe restructuring the results section around each metric would result in redundancy. The value of our analysis is in the comparison of different methods; therefore, instead of talking about methods in isolation, we provide an integrated discussion and comparison of all three methods across all metrics. Instead, we have maintained our current structure but ensured that the metrics are consistently described and analyzed.

      Point 5: Abundance Correlation

      We agree that the lack of a correlation between methods for abundance remains an open question. However, we maintain that fitting a linear model would be inappropriate and potentially misleading in the absence of significant correlation. We have clarified this in our manuscript.

      Point 6: Richness and Diversity Evaluations

      The authors disagree with Reviewer 1's feedback, citing a clear misunderstanding of standard parameterization practices used to prevent overfitting. Specifically, authors implemented a 30/70 Training/Testing split. Therefore only 30% of the data was used to fit the model and 70% of the dataset was reserved for testing to ensure the validity and reliability of our clustering results. By validating with a 70% testing dataset, we ensure that the clustering model can accurately group new data points and is robust against overfitting. This process helps verify that the identified clusters are meaningful and consistent across different subsets of the data.  Spearman's rho converts the data values into ranks and does not assume a linear relationship between the variables or require the data to follow a normal distribution. Spearman's rank correlation offers robustness against non-linearity and outliers by focusing on ranks. This approach is explained in the 4th paragraph of the “Data Analysis” section.

      Point 7: Clustering Method Credibility

      Authors acknowledge the variability in optical sensor features. However, the Law of Large Numbers supports increased insect measurement accuracy and stability occurs from optical insect sensors due to the increased number of observations made by the optical sensors compared to conventional methods. The manuscript now includes a detailed discussion of these aspects in the 3rd paragraph of discussion, emphasizing the correlation observed despite variability.

      Reviewer 2:

      Authors appreciate Reviewer 2’s feedback especially regarding contextualization. While authors disagree with the need for more specific experimental questions in a methods paper and the suggested need for more complex analysis, we agree with the essence of the review and added additional text regarding potential questions, method applications, and ecosystem processes for contextualization.

      Point 1: Larger Question Framing

      We present this article as a methodological paper rather than asking a specific experimental question. This approach is justified by the generalizable nature of methods papers, akin to those describing ImageJ or mass spectrometers. The method is widely applicable to a range of scientific questions. 

      We provided a discussion on how this technology could be applied in community ecology, conservation, and managed ecological systems like agriculture.

      In the Conclusion section we provided elaboration on the potential research questions and applications.

      Point 2: Complex Analyses

      While complex analyses like NMDS are useful for specific questions, this paper aims to establish the method. Once established, this method can be applied to various research questions in future studies. Therefore, as we are not directly asking an experimental question, more complex analysis is unnecessary.

      Point 3: Ecosystem Process (Granivory) Assay

      We have improved the contextualization and explanation of the ecosystem process assay throughout the manuscript, ensuring it is well-integrated and clear to readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper explores how diverse forms of inhibition impact firing rates in models for cortical circuits. In particular, the paper studies how the network operating point affects the balance of direct inhibition from SOM inhibitory neurons to pyramidal cells, and disinhibition from SOM inhibitory input to PV inhibitory neurons. This is an important issue as these two inhibitory pathways have largely been studies in isolation. Support for the main conclusions is generally solid, but could be strengthened by additional analyses.

      Strengths:

      A major strength of the paper is the systematic exploration of how circuit architecture effects the impact of inhibition. This includes scans across parameter space to determine how firing rates and stability depend on effective connectivity. This is done through linearization of the circuit about an effective operating point, and then the study of how perturbations in input effect this linear approximation.

      Weaknesses:

      The linearization approach means that the conclusions of the paper are valid only on the linear regime of network behavior. The paper would be substantially strengthened with a test of whether the conclusions from the linearized circuit hold over a large range of network activity. Is it possible to simulate the full network and do some targeted tests of the conclusions from linearization? Those tests could be guided by the linearization to focus on specific parameter ranges of interest.

      We agree with the reviewer that it would be interesting to test if our results hold in a nonlinear regime of network behaviour (i.e. the chaotic regime, see also comment 1 by reviewer 2). As mentioned above, this requires a different type of model (either rate-based or spiking model with multiple neurons instead of modelling the mean population rate dynamics) which, in our opinion, exceeds the scope of this manuscript. Furthermore, the core measures of our study, network gain, and stability require linearization. In a chaotic regime where the linearization approach is impossible, we would need to consider/define new measures to characterize network response/activity. Therefore, while certainly being an interesting question to study, the broad scope of the studying networks in a nonlinear regime is better tackled in a separate study. We now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      The results illustrated in the figures are generally well described but there is very little intuition provided for them. Are there simplified examples or explanations that could be given to help the results make sense? Here are some places such intuition would be particularly helpful:

      page 6, paragraph starting ”In sum ...”

      Page 8, last paragraph

      Page 10, paragraph starting ”In summary ...”

      Page 11, sentence starting ”In sum ...”

      We agree with the reviewer that we didn’t provide enough intuition to our results. We now extended the paragraphs listed by the reviewer with additional information, providing a more intuitive understanding of the results presented in the respective chapter.

      Reviewer #2 (Public Review):

      Summary:

      Bos and colleagues address the important question of how two major inhibitory interneuron classes in the neocortex differentially affect cortical dynamics. They address this question by studying Wilson-Cowan-type mathematical models. Using a linearized fixed point approach, they provide convincing evidence that the existence of multiple interneuron classes can explain the counterintuitive finding that inhibitory modulation can increase the gain of the excitatory cell population while also increasing the stability of the circuit’s state to minor perturbations. This effect depends on the connection strengths within their circuit model, providing valuable guidance as to when and why it arises.

      Overall, I find this study to have substantial merit. I have some suggestions on how to improve the clarity and completeness of the paper.

      Strengths:

      (1) The thorough investigation of how changes in the connectivity structure affect the gain-stability relationship is a major strength of this work. It provides an opportunity to understand when and why gain and stability will or will not both increase together. It also provides a nice bridge to the experimental literature, where different gain-stability relationships are reported from different studies.

      (2) The simplified and abstracted mathematical model has the benefit of facilitating our understanding of this puzzling phenomenon. (I have some suggestions for how the authors could push this understanding further.) It is not easy to find the right balance between biologically detailed models vs simple but mathematically tractable ones, and I think the authors struck an excellent balance in this study.

      Weaknesses:

      (1) The fixed-point analysis has potentially substantial limitations for understanding cortical computations away from the steady-state. I think the authors should have emphasized this limitation more strongly and possibly included some additional analyses to show that their conclusions extend to the chaotic dynamical regimes in which cortical circuits often live.

      We agree with the reviewer that it would be interesting to test if our results hold in a chaotic regime of network behaviour (see also comment by reviewer 1). As mentioned above, this requires a different type of model (either rate-based or spiking model with multiple neurons instead of modelling the mean population rate dynamics) which, in our opinion, exceeds the scope of this manuscript. Furthermore, the core measures of our study, network gain, and stability require linearization. In a chaotic regime where the linearization approach is impossible, we would need to consider/define new measures to characterize network response/activity. Therefore, while certainly being an interesting question to study, the broad scope of the studying networks in a nonlinear regime is better tackled in a separate study. We now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      (2) The authors could have discussed – even somewhat speculatively – how SST interneurons fit into this picture. Their absence from this modelling framework stands out as a missed opportunity.

      We believe that the reviewer wanted us to speculate about VIP interneurons (and not SST interneurons, which we already do extensively in the manuscript). Previous models have included VIP neurons in the circuit (e.g. del Molino et al., 2017; Palmigiano et al., 2023; Waitzmann et al., 2024). While we do not model VIP cells explicitly, we implicitly assume that a possible source of modulation of SOM neurons comes from VIP cells. We have now added a short discussion on VIP cells in the last paragraph in our discussion section.

      (3) The analysis is limited to paths within this simple E,PV,SOM circuit. This misses more extended paths (like thalamocortical loops) that involve interactions between multiple brain areas. Including those paths in the expansion in Eqs. 11-14 (Fig. 1C) may be an important consideration.

      We agree with the reviewer that our framework can be extended to study many other different paths, like thalamocortical loops, cortical layer-specific connectivity motifs, or circuits with VIP or L1 inhibitory neurons. Studying these questions, however, are beyond the scope of our work. In our discussion, we now mention the possibility of using our framework to study those questions.

      Reviewer #3 (Public Review):

      Summary:

      Bos et al study a computational model of cortical circuits with excitatory (E) and two subtypes of inhibition parvalbumin (PV) and somatostatin (SOM) expressing interneurons. They perform stability and gain analysis of simplified models with nonlinear transfer functions when SOM neurons are perturbed. Their analysis suggests that in a specific setup of connectivity, instability and gain can be untangled, such that SOM modulation leads to both increases in stability and gain. This is in contrast with the typical direction in neuronal networks where increased gain results in decreased stability.

      Strengths:

      - Analysis of the canonical circuit in response to SOM perturbations. Through numerical simulations and mathematical analysis, the authors have provided a rather comprehensive picture of how SOM modulation may affect response changes.

      - Shedding light on two opposing circuit motifs involved in the canonical E-PV-SOM circuitry - namely, direct inhibition (SOM → E) vs disinhibition (SOM → PV → E). These two pathways can lead to opposing effects, and it is often difficult to predict which one results from modulating SOM neurons. In simplified circuits, the authors show how these two motifs can emerge and depend on parameters like connection weights.

      - Suggesting potentially interesting consequences for cortical computation. The authors suggest that certain regimes of connectivity may lead to untangling of stability and gain, such that increases in network gain are not compromised by decreasing stability. They also link SOM modulation in different connectivity regimes to versatile computations in visual processing in simple models.

      Weaknesses:

      The computational analysis is not novel per se, and the link to biology is not direct/clear.

      Computationally, the analysis is solid, but it’s very similar to previous studies (del Molino et al, 2017). Many studies in the past few years have done the perturbation analysis of a similar circuitry with or without nonlinear transfer functions (some of them listed in the references). This study applies the same framework to SOM perturbations, which is a useful and interesting computational exercise, in view of the complexity of the high-dimensional parameter space. But the mathematical framework is not novel per se, undermining the claim of providing a new framework (or ”circuit theory”).

      In the introduction we acknowledge that our analysis method is not novel but is rather based on previous studies (del Molino et al., 2017; Kuchibhotla et al., 2017; Kumar et al., 2023, Litwin-Kumar et al., 2016; Mahrach et al., 2020; Palmigiano et al., 2023; Veit et al., 2023; Waitzmann et al., 2024). We now rewrote parts of the introduction to make sure that it does not sound like the computational analysis has been developed by us, but that we rather use those previously developed frameworks to dissect stability and gain via SOM modulation.

      Link to biology: the most interesting result of the paper with regard to biology is the suggestion of a regime in which gain and stability can be modulated in an unconventional way - however, it is difficult to link the results to biological networks: - A general weakness of the paper is a lack of direct comparison to biological parameters or experiments. How different experiments can be reconciled by the results obtained here, and what new circuit mechanisms can be revealed? In its current form, the paper reads as a general suggestion that different combinations of gain modulation and stability can be achieved in a circuit model equipped with many parameters (12 parameters). This is potentially interesting but not surprising, given the high dimensional space of possible dynamical properties. A more interesting result would have been to relate this to biology, by providing reasoning why it might be relevant to certain circuits (and not others), or to provide some predictions or postdictions, which are currently missing in the manuscript.

      - For instance, a nice motivation for the paper at the beginning of the Results section is the different results of SOM modulation in different experiments - especially between L23 (inhibition) and L4 (disinhibition). But no further explanation is provided for why such a difference should exist, in view of their results and the insights obtained from their suggested circuit mechanisms. How the parameters identified for the two regimes correspond to different properties of different layers?

      As pointed out by the reviewer, the main goal of our manuscript is to provide a general understanding of how gain and stability depend on different circuit motifs (ie different connectivity parameters), and how circuit modulations via SOM neurons affect those measures. However, we agree with the reviewer that it would be useful to provide some concrete predictions or postdictions following from our study.

      An interesting example of a postdiction of our model is that the firing rate change of excitatory neurons in response to a change in the stimulus (which we define as network gain, Eq. 2) depends on firing rates of the excitatory, PV, and SOM neurons at the moment of stimulus presentation (Fig. 3ii; Fig. 4Aii,Bii,Cii; Fig. 5Aii, Bii, Cii). Hence any change in input to the circuit can affect the response gain to a stimulus presentation, in line with experimental evidence which suggests that changes in inhibitory firing rates and changes in the behavioral state of the animal lead to gain modifications (Ferguson and Cardin 2020).

      Another recent concrete example is the study of Tobin et al., 2023, in which the authors show that optogenetically activating SOM cells in the mouse primary auditory cortex (A1) decreases the excitatory responses to auditory stimuli. In our framework, this corresponds to the case of decreases in network gain (gE) for positive SOM modulation, as seen in the circuit with PV to SOM feedback connectivity (Suppl. Fig. S1).

      Another example is the study by Phillips and Hasenstaub 2016, in which the authors study the effect of optogenetic perturbations of SOM (and PV) cells on tuning curves of pyramidal cells in mouse A1. While they find large heterogeneity in additive/subtractive or multiplicative/divisive tuning curve changes following SOM inactivation, most cells have a purely multiplicative or purely additive component (and none of the cells have a divisive component). In our study, we see that large multiplicative responses of the excitatory population follow from circuits with strong E to SOM feedback connectivity.

      We note that in future computational studies, it would be useful to apply our framework with a focus on a specific brain region and add all relevant cell types (at a minimum E, PV, SOM, and VIP) plus a dendritic compartment, in order to formulate much more precise experimental predictions.

      We have now added additional information to the discussion section.

      - Another caveat is the range of parameters needed to obtain the unintuitive untangling as a result of SOM modulation. From Figure 4, it appears that the ”interesting” regime (with increases in both gain and stability) is only feasible for a very narrow range of SOM firing rates (before 3 Hz). This can be a problem for the computational models if the sweet spot is a very narrow region (this analysis is by the way missing, so making it difficult to know how robust the result is in terms of parameter regions). In terms of biology, it is difficult to reconcile this with the realistic firing rates in the cortex: in the mouse cortex, for instance, we know that SOM neurons can be quite active (comparable to E neurons), especially in response to stimuli. It is therefore not clear if we should expect this mechanism to be a relevant one for cortical activity regimes.

      We agree with the reviewer that it’s important to test the robustness of our results. As suggested by the reviewer, we now include a new supplementary figure (Suppl. Fig. S2) which measures the percentage of data points in the respective quadrant Q1-Q4 when changing the SOM firing rates (as done in Fig. 5). We see that the quadrants in which the network gain and stability change in the same direction (Q2 and Q3) remain high in the case for E to SOM feedback (Suppl. Fig. S2A) over SOM rates ranging over 0-10 Hz (and likely beyond).

      - One of the key assumptions of the model is nonlinear transfer functions for all neuron types. In terms of modelling and computational analysis, a thorough analysis of how and when this is necessary is missing (an analysis similar to what has been attempted at in Figure 6 for synaptic weights, but for cellular gains). In terms of biology, the nonlinear transfer function has experimentally been reported for excitatory neurons, so it’s not clear to what extent this may hold for different inhibitory subtypes. A discussion of this, along with the former analysis to know which nonlinearities would be necessary for the results, is needed, but currently missing from the study. The nonlinearity is assumed for all subtypes because it seems to be needed to obtain the results, but it’s not clear how the model would behave in the presence or absence of them, and whether they are relevant to biological networks with inhibitory transfer functions.

      It is true that the nonlinear transfer function is a key component in our model. We chose identical transfer functions for E, PV, and SOM (; Eq. 4) to simplify our analysis. If the transfer function of one of the neuron types would be linear (β \= 1), then the corresponding b terms (the slope of the nonlinearity at the steady state; b \= dfX/dqX; Fig. 1B; Eq. 4) would be equal to α. Therefore, if neurons had a linear transfer function in our model, there would not be a dependence of network gain on E and PV firing rate as studied in Fig. 3-5. This is because the relationship between PV rates and their gain would be constant (bP \= α) in Fig. 1B (bottom).

      If all the transfer functions were linear, changes in firing rates would not have an impact on network gain or stability. Changing the nonlinear transfer function by changing the α or β terms in Eq. 4 would only scale the way a change in the rates affects the b terms and hence the results presented in Fig. 3-5. More interesting would be to study how different types of nonlinearities, like sigmoidal functions or sublinear nonlinearities (i.e. saturating nonlinearities), would change our results. However, we think that such an investigation is out of scope for this study. We now added a comment to the Methods section.

      Experimentally, F-I curves have been measured also for PV and SOM neurons. For example, Romero-Sosa et al., 2021 measure the F-I curve of pyramidal, PV and SOM neurons in mouse cortical slices. They find that similar to pyramidal neurons, PV and SOM neurons show a nonlinear F-I curve. We now added the citation of Romero-Sosa et al., 2021 to our manuscript.

      - Tuning curves are simulated for an individual orientation (same for all), not considering the heterogeneity of neuronal networks with multiple orientation selectivity (and other visual features) - making the model too simplistic.

      The reviewer is correct that we only study changes in tuning curves in a simplistic model. In our model, the excitatory and PV populations are tuned to a single orientation (in the case of Fig. 7 to θ \= 90). While this is certainly an oversimplification, it allows us to understand how additive/subtractive and multiplicative/divisive changes in the tuning curves come about in networks with different connectivity motifs. To model heterogeneity of tuning responses within a network, it requires more complex models. A natural choice would be to extend a classical ring attractor model (Rubin et al., 2015) by splitting the inhibitory population into PV and SOM neurons, or study the tuning curve heterogeneity that occurs in balanced networks (Hansel and van Vreeswijk 2012). However, this model has many more parameters, like the spatial connectivity profiles from and onto PV and SOM neurons. While highly valuable, we believe that studying such models exceeds the scope of our current manuscript. We now added a paragraph in the discussion section, mentioning this as an interesting future direction.

      Reviewer #1 (Recommendations For The Authors):

      The last sentence of the abstract is hard to interpret before reading the rest of the paper - suggest replacing or rephrasing.

      We rephrased the sentence to make more clear what we mean.

      Page 3, last full paragraph: I think this assumes that phi is positive. What is the justification for that assumption? More generally, I think you could say a bit more about phi in the main text since it is a fairly complicated term.

      The reviewer is correct, for a stable system phi is always positive. We now clarify this and explain phi in more detail in the main text.

      Fig 1D: It would be helpful to identify when the stimulus comes on and be clearer about what the stimulus is. I assume it’s a step increase in S input at 0.05 s or so - but that should be immediately apparent looking at the figure.

      We agree with the reviewer and we added a dashed line at the time of stimulus onset in Fig. 1D.

      Page 5: ”To motivate our analysis we compare ... (Fig. 2A)” - Figure 2A does not show responses without modulation, so this sentence is confusing.

      The dashed lines in Fig. 2A (and Fig. 2C) actually represents the rate change without modulation.

      Page 6: sentence “The central goal of our study ...” seems out of place since this is pretty far into the results, and that goal should already be clear.

      We agree with the reviewer, hence we updated the sentence.

      Page 10, top: the green curve in panel Aii always has a negative slope - so I am confused by the statement that increasing wSE decreases both gain and stability.

      We thank the reviewer for pointing out this mistake. We now fixed it in the text.

      Figure 6: in general it is hard to see what is going on in this figure (the green and blue in particular are hard to distinguish). Some additional labels would be helpful, but I would also see if the color scheme can be improved.

      We added a zoom-in to the panels which were hard to distinguish.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendations:

      (1) The authors should explain early on in the results section what the key factor(s) is that differentiates SOM from PV cells in their model. E.g., in Fig. 1A, the only obvious difference is that SOM cells don’t inhibit themselves. However, later on in the paper, the difference in external stimulus drive to these interneuron classes is more heavily emphasized. Given the importance of that difference (in external stim drive), I think this should be highlighted early on.

      We now mention the key factors that differentiate PV and SOM neurons already when describing Fig. 1A.

      (2) The result in Figs. 5,6 demonstrate that recurrent SOM connectivity is important for achieving increases in both gain and stability. This observation could benefit from some intuitive explanation. Perhaps the authors could find this explanation by looking at their series expansion (Eqs. 11-14, Fig. 1C) and determining which term(s) are most important for this effect. The corresponding paths through the circuit – the most important ones – could then be highlighted for the reader.

      We agree with the reviewer that our results benefit from more intuitive explanations. This has also been pointed out by reviewer 1 in their public review. We now extended the concluding paragraphs in the context of Fig. 4-6 with additional information, providing a more intuitive understanding of the results presented in the respective chapter. While it is possible to gain an intuitive understanding of how the network gain depends on rate and weight parameters (Eq. 2), this understanding is unfortunately missing in the case of stability. The maximum eigenvalue of the system have a complex relationship with all the parameters, and often have nonlinear dependencies on changes of a parameter (e.g. as we show in Fig. 3iv or one can see in Fig. 6). We now discuss this difficulty at the end of the section “Influence of weight strength on network gain vs stability”.

      (3) I think the authors should consider including some analyses that do not rely on the system being at or near a fixed point. I admit that such analysis could be difficult, and this could of course be done in a future study. Nevertheless, I want to reiterate that this addition could add a lot of value to this body of work.

      As outlined above, we decided to not include additional analysis on network behaviour in nonlinear regimes but we now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      Minor recommendations:

      (1) At the top of P. 6, when the authors first discuss the stability criterion involving eigenvalues, they should address the question ”eigenvalues of what?”. I suggest introducing the idea of the Jacobian matrix, and explaining that the largest eigenvalue of that matrix determines how rapidly the system will return to the fixed point after a small perturbation.

      We included an additional sentence in the respective paragraph explaining the link between stability and negative eigenvalues, and we also added a sentence in the Methods section stating the the largest real eigenvalue dominates the behavior of the dynamical system.

      (2) The panel labelling in Fig. 3 is unnecessarily confusing. It would be simpler (and thus better) to simply label the panels A,B,C,D, or i,ii,iii,iv, instead of the current labelling: Ai, Aii, Aiii, Aiv. (There are currently no panels ”B” in Fig. 3).

      We updated the figure accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • Suggestions for improved or additional experiments, data or analyses.

      Analysis of the effect of different nonlinear transfer functions is necessary.

      Please see our detailed answer to the reviewer’s comment in the public review above.

      Analysis of gain modulation in models with more realistic tuning properties.

      Please see our detailed answer to the reviewer’s comment in the public review above.

      Mathematical analysis of the conditions to obtain ”untangled” gain and stability:

      One of the promises of the paper is that it is offering a computational framework or circuit theory for understanding the effect of SOM perturbation. However, the main result, namely the untangling of gain and stability, has only been reported in numerical simulations (e.g. Fig. 6). Different parameters have been changed and the results of simulations have been reported for different conditions. Given the simplified model, which allows for rigorous mathematical analysis, isn’t it possible to treat this phenomenon more analytically? What would be the conditions for the emergence of the untangled regime? This is currently missing from the analyses and results.

      We agree with the reviewer that our results benefit from more intuitive explanations. This has also been pointed out by reviewer 1 in their public review. We now extended the concluding paragraphs in the context of Fig. 4-6 with additional information, providing a more intuitive understanding of the results presented in the respective chapter. While it is possible understand analytically of how the network gain depends on rate and weight parameters (Eq. 2), this understanding is unfortunately missing in the case of stability. The maximum eigenvalue of the system have a complex relationship with all the parameters, and often have nonlinear dependencies on changes of a parameter (e.g. as we show in Fig. 3iv or one can see in Fig. 6). This doesn’t allow for a a deep analytical understanding of the entangling of gain and stability. We now discuss this difficulty at the end of the section “Influence of weight strength on network gain vs stability”.

      • Recommendations for improving the writing and presentation. The Results section is well written overall, but other parts, especially the Introduction and Discussion, would benefit from proof reading - there are many typos and problems with sentence structures and wording (some mentioned below).

      We have gone through the manuscript again and improved the writing.

      The presentation of the dependence on weight in Figure 6 can be improved. For instance, the authors talk about the optimal range of PV connectivity, but this is difficult to appreciate in the current illustration and with the current colour scheme.

      We added a zoom in to the panels which were hard to distinguish.

      • Minor corrections to the text and figures. Text:

      We thank the reviewer for their thorough reading of our manuscript. We fixed all the issues from below in the manuscript.

      Some examples of bad structure or wording:

      From the Abstract:

      ”We show when E - PV networks recurrently connect with SOM neurons then an SOM mediated modulation that leads to increased neuronal gain can also yield increased network stability.” From Introduction:

      Sentence starting with ”This new circuit reality ...”

      ”Inhibition is been long identified as a physiological or circuit basis for how cortical activity changes depending upon processing or cognitive needs ...”

      Sentence starting with ”Cortical models with both ...”

      ”... allowing SOM neurons the freedom to ..”

      From Results:

      ”... affects of SOM neurons on E ..”

      ”seem in opposition to one another, with SOM neuron activity providing either a source or a relief of E neuron suppression”. The sentence after is also difficult to read and needs to be simplified.

      P. 7: ”We first remark that ...”

      Difficult to read/understand - long and badly structured sentence.

      P. 8: ”adding a recurrent connection onto SOM neurons from the E-PV subcircuit” It’s from E (and not PV) to be more precise (Fig. 5).

      Discussion:

      ”Firstly, E neurons and PV neurons experience very similar synaptic environments.” What does it mean?

      ”Fortunately, PV neurons target both the cell bodies and proximal dendrites” Fortunately for whom or what? ”in line with arge heterogeneity”

      Methods:

      Matrix B is never defined - the diagonal matrix of b (power law exponents) I assume.

      Some of the other notations too, e.g. bs, etc (it’s implicit, but should be explained).

      Structure of sentence:

      ”Network gain is defined as ...” (p. 17)

      Figure:

      The schematics in Figure 4 can be tweaked to highlight the effect of input (rather than other components of the network, which are the same and repetitive), to highlight the main difference for the reader.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths:

      Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses:

      The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      Our revision overhauls the interpretation of the results to prioritize the results we have high confidence in (specifically, PC 2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as PC 1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with R2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      (1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      (2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. Our revision is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferroni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In our revision, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      (3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision includes confidence intervals around ⍴signal for the PN PC2 OCT-MCH model, and for the ORN Brp-Short PC2 OCT-MCH model (lines 170-172, 238)

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the current revision addresses the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn:

      i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We worked to ensure such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and have revised the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure.

      Author response image 1.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when we were only able to image a small portion of the glomeruli.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements.

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements, i.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC 2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states – we have added these for the PN PC2 linear model (lines 170-172).

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we worked to ensure this is appropriately reflected in all word choice across the paper.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Detailed comments: Many of the problems can be identified starting from Figure 4, which summarizes the main claims. I will focus on that figure and its tributaries.

      Acknowledging that the strength of several of our inferences are weak compared to what we consider the main result (the relationship between PC2 of Ca++ and OCT-MCH preference),we have removed Figure 4. This makes the focus of the paper much clearer and appropriately puts focus on the results that have strong statistical support.

      (1) The process of "inferring" correlation among the unobserved latent states for neural sensitivity and behavioral bias is unconventional and risky. The larger the assumed noise linking the latent to the observed variables (i.e. the smaller r_b and r_c) the bigger the inferred correlation rho from a given observed correlation R^2_cb. In this situation, the value of the inferred rho becomes highly dependent on what model one assumes that links latent to observed states. But the specific model drawn in Fig 4 suppl 1 is just one of many possible guesses. For example, models with nonlinear interactions could produce different inference.

      We agree with the reviewer’s notes of caution. To be clear, we do not intend for this analysis to be the main takeaway of the paper and have revised it to make this clear. The signal we are most confident in is the simple correlation between measured Ca++ PC2 and measured behavior. We have added more careful language saying that the attempt to infer the correlation between latent signals is one attempt at describing the data generation process (lines 166-172), and one possible estimate of an “underlying” correlation.

      (2) If one still wanted to go through with this inference process and set confidence bounds on rho, one needs to include all the uncertainties. Here the authors only include uncertainty in the value of R^2_c,b and they peg that at +/-20% (Line 1367). In addition there is plenty of uncertainty associated also with R^2_c,c and R^2_b,b. This will propagate into a wider confidence interval on rho.

      We have replaced the arbitrary +/- 20% window with bootstrapping the pairs of (predicted preference by PN PC2, measured preference) points and getting a bootstrap distribution of R2c,b, which is, not surprisingly, considerably wider. Still, we think there is some value in this analysis as the 90% CI of 𝜌signal under this model is 0.24-0.95. That is, including uncertainty about the R2b,b and R2c,c in the model still implies a significant relationship between latent calcium and behavior signals.

      (2.1) The uncertainty in R^2_cb is much greater than +/-20%. Take for example the highest correlation quoted in Fig 4: R^2=0.23 in the top row of panel A. This relationship refers to Fig 1L. Based on bootstrapping from this data set, I find a 90% confidence interval of CI=[0.002, 0.527]. That's an uncertainty of -100/+140%, not +/-20%. Moreover, this correlation is due entirely to the lone outlier on the bottom left. Removing that single fly abolishes any correlation in the data (R^2=0.04, p>0.3). With that the correlation of rho=0.64, the second-largest effect in Fig 4, disappears.

      We acknowledge that removal of the outlier in Fig 1L abolishes the correlation between predicted and measured OCT-AIR preference. We have thus moved that subfigure to the supplement (now Figure 1 – figure supplement 10B), note that we do not have robust statistical support of ORN PC1 predicting OCT-AIR preference in the results (lines 177-178), and place our emphasis on PN PC2’s capacity to predict OCT-MCH preference throughout the text.

      (2.2) Similarly with the bottom line of Fig 4A, which relies on Fig 1M. With the data as plotted, the confidence interval on R^2 is CI=[0.007, 0.201], again an uncertainty of -100/+140%. There are two clear outlier points, and if one removes those, the correlation disappears entirely (R^2=0.06, p=0.09).

      We acknowledge that removal of the two outliers in Fig 1M between predicted and measured OCT-AIR preference abolishes the correlation. We have also moved that subfigure to the supplement (now Figure 1 – figure supplement 10F) and do not claim to have robust statistical support of PN PC1 predicting OCT-AIR preference.

      (2.3) Similarly, the correlation R^2_bb of behavior with itself is weak and comes with great uncertainty (Fig 1 Suppl 1, panels B-E). For example, panel D figures prominently in computing the large inferred correlation of 0.75 between PN responses and OCT-MCH choice (Line 171ff). That correlation is weak and has a very wide confidence interval CI=[0.018, 0.329]. This uncertainty about R^2_bb should be taken into account when computing the likelihood of rho.

      We now include bootstrapping of the 3 hour OCT-MCH persistence data in our inference of 𝜌signal.

      (2.4) The correlation R^2_cc for the empirical repeatability of Ca signals seems to be obtained by a different method. Fig 4 suppl 1 focuses on the repeatability of calcium recording at two different time points. But Line 625ff suggests the correlation R^2_cc=0.77 all derives from one time point. It is unclear how these are related.

      Because our calcium model predictors utilize principal components of the glomerulus-odor responses (the mean Δf/f in the odor presentation window), we compute R2c,c through adding variance explained along the PCs, up to the point in which the component-wise variance explained does not exceed that of shuffled data (lines 609-620 in Materials and Methods). In this revision we now bootstrap the calcium data on the level of individual flies to get a bootstrap distribution of R2c,c, and propagate the uncertainty forward in the inference of 𝜌signal.

      (2.5) To summarize, two of the key relationships in Fig 1 are due entirely to one or two outlier points. These should not even be used for further analysis, yet they underlie two of the claims in Fig 4. The other correlations are weak, and come with great uncertainty, as confirmed by resampling. Those uncertainties should be propagated through the inference procedure described in Fig 4. It seems possible that the result will be entirely uninformative, leaving rho with a confidence interval that spans the entire available range [0,1]. Until that analysis is done, the claims of neuron-to-behavior correlation in this manuscript are not convincing.

      It is important to note that we never thought our analysis of the relationship between latent behavior and calcium signals should be interpreted as the main finding. Instead, the observed correlation between measured behavior and calcium is the take-away result. Importantly, it is also conservative compared to the inferred latent relationship, which in our minds was always a “bonus” analysis. Our revisions are now focused on highlighting the correlations between measured signals that have strong statistical support.

      As a response to these specific concerns, we have propagated uncertainty in all R2’s (calcium-calcium, behavior-behavior, calcium-behavior) in our new inference for 𝜌signal, yielding a new median estimate for PN PC 2 underlying OCT-MCH preference of 0.68, with a 90% CI of 0.24-0.95. (Lines 171-172 in results, Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) Other statistical methods:

      (3.1) The caption of Fig 4 refers to "model applied to train+test data". Does that mean the training data were included in the correlation measurement? Depending on the number of degrees of freedom in the model, this could have led to overfitting.

      We have removed Figure 4 and emphasize the key results in Figure 1 and 2 that we see statistically robust signal of PN PC 2 explaining OCT-MCH preference variation in both a training set and a testing set of flies (Fig 2 – figure supplement 1C-D).

      (3.2) Line 180 describes a model that performed twice as well on test data (31% EV) as it did on training data (15%). What would explain such an outcome? And how does that affect one's confidence in the 31% number?

      The test set recordings were conducted several weeks after the training set recordings, which were used to establish PN PC 2 as a correlate of OCT-MCH preference. The fact that the test data had a higher R2 likely reflects sampling error (these two correlation coefficients are not significantly different). Ultimately this gives us more confidence in our model, as the predictive capacity is maintained in a totally separate set of flies.

      (3.340 Multiple models get compared in performance before settling on one. For example, sometimes the first PC is used, sometimes the second. Different weighting schemes appear in Fig 2. Do the quoted p-values for the correlation plots reflect a correction for multiple hypothesis testing?

      For all calcium-behavior models, we restricted our analysis to 5 PCs, as the proportion of calcium variance explained by each of these PCs was higher than that explained by the respective PC of shuffled data — i.e., there were at most five significant PCs in that data. We thus performed at most 5 hypothesis tests for a given model. PN PC 2 explained 15% of OCT-MCH preference variation, with a p-value of 0.0063 – this p-value is robust to a conservative Bonferroni correction to the 5 hypotheses considered at alpha=0.05.

      The weight schemes in Figure 2 and Figure 1 – figure supplement 10 reflect our interpretations of the salient features of the PCs and are follow-up analysis of the single principal component hypothesis tests. Thus they do not constitute additional tests that should be corrected. We now state in the methods explicitly that all reported p-values are nominal (line 563).

      (3.4) Line 165 ff: Quoting rho without giving the confidence interval is misleading. For example, the rho for the presynaptic density model is quoted as 0.51, which would be a sizeable correlation. But in fact, the posterior on rho is almost flat, see caption of Fig 4 suppl 1, which lists the CI as [0.11, 0.85]. That means the experiments place virtually no constraint on rho. If the authors had taken no data at all, the posterior on rho would be uniform, and give a median of 0.5.

      We now provide a confidence interval around 𝜌signal for the PN PC 2 model (lines 170-172). But per above, and consistent with the new focus of this revision, we view the 𝜌signal inference as secondary to the simple, significant correlation between PN PC 2 and OCT-MCH preference.

      (4) As it stands now, this paper illustrates how difficult it is to come to a strong conclusion in this domain. This may be worth some discussion. This group is probably in a better position than any to identify what are the limiting factors for this kind of research.

      We thank the reviewer for this suggestion and have added discussion of the difficulties in detecting signals for this kind of problem. That said, we are confident in stating that there is a meaningful correlation between PC 2 of PN Ca++ responses and OCT-MCH behavior given our model’s performance in predicting preference in a test set of flies, and in the consistent signal in ORN Bruchpilot.

      Reviewer #3 (Recommendations for the Authors):

      Two major concerns, one experimental/technical and one conceptual:

      (1) I appreciate the difficulty of the experimental design and problem. However, the correlations reported throughout are based on neural measurements in only 5 glomeruli (~10% of the olfactory system) at early stages of olfactory processing.

      We acknowledge that only imaging 5 glomeruli is regrettable. We worked hard to develop image analysis pipelines that could reliably segment as many glomeruli as possible from almost all individual flies. In the end, we concluded that it was better to focus our analysis on a (small) core set of glomeruli for which we had high confidence in the segmentation. Increasing the number of analyzed glomeruli is high on the list of improvements for subsequent studies. Happily, we are confident that we are capturing a significant, biologically meaningful correlation between PC 2 of PN calcium (dominated by the responses in DC2 and DM2) and OCT-MCH preference.

      3-OCT and MCH activate many glomeruli in addition to the five studied, especially at the concentrations used. There is also limited odor-specificity in their response matrix: notably responses are more correlated in all glomeruli within an individual, compared to responses across individuals (they note this in lines 194-198, though I don't quite understand the specific point they make here). This is a sign of high experimental variability (typically the dynamic range of odor response within an individual is similar to the range across individuals) and makes it even more difficult to resolve underlying individual variation.

      We respectfully disagree with the reviewer’s interpretation here. There is substantial odor-specificity in our response matrix. This is evident in both the ORN and PN response matrices (and especially the PN matrix) as variation in the brightness across rows. Columns, which correspond to individuals, are more similar than rows, which correspond to odor-glomerulus pairs. The dynamic range within an individual (within a column, across rows) is indeed greater than the variation among individuals (within a row, across columns).

      As an (important) aside, the odor stimuli are very unusual in this study. Odors are delivered at extremely high concentrations (variably 10-25% sv, line 464, not exactly sure what "variably' means- is the stimulus intensity not constant?) as compared to even the highest concentrations used in >95% of other studies (usually <~0.1% sv delivered).

      We used these concentrations for a variety of reasons. First, following the protocol of Honegger and Smith (2020), we found that dilutions in this range produce a linear input-output relationship, i.e. doubling or halving one odorant yields proportionate changes in odor-choice behavior metrics. Second, such fold dilutions are standard for tunnel assays of the kind we used. Claridge-Chang et al. (2009) used 14% and 11% for MCH and OCT respectively, for instance. Finally, the specific dilution factor (i.e., within the range of 10-25%) was adjusted on a week-by-week basis to ensure that in an OCT-MCH choice, the mean preference was approximately 50%. This yields the greatest signal of individual odor preference. We have added this last point to the methods section where the range of dilutions is described (lines 442-445).

      A parsimonious interpretation of their results is that the strongest correlation they see (ORN PC1 predicts OCT v. air preference) arises because intensity/strength of ORN responses across all odors (e.g. overall excitability of ORNs) partially predicts behavioral avoidance of 3-OCT. However, the degree to which variation in odor-specific glomerular activation patterns can explain behavioral preference (3-OCT v. MCH) seems much less clear, and correspondingly the correlations are weaker and p-values larger for the 3-OCT v. MCH result.

      With respect, we disagree with this analysis. The correlation between ORN PC 1 and OCT v. air preference (R2 \= 0.23) is quite similar to that of PN PC 2 and OCT vs MCH preference (R2 \= 0.20). However, the former is dependent on a single outlying point, whereas the latter is not. The latter relationship is also backed up by the BRP imaging and modeling. Therefore in the revision we have de-emphasized the OCT v. air preference model and emphasized the OCT v. MCH preference models.

      (2) There is a broader conceptual concern about the degree of logical consistency in the authors' interpretation of how neural variability maps to behavioral variability. For instance, the two odors they focus on, 3-OCT and MCH, barely activate ORNs in 4 of the 5 glomeruli they study. Most of the correlation of ORN PC1 vs. behavioral choice for 3-OCT vs. air, then, must be driven by overall glomerular activation by other odors (but remains predictive since responses across odors appear correlated within an individual). This gives pause to the interpretation that 3-OCT-evoked ORN activity in these five glomeruli is the neural substrate for variability in the behavioral response to 3-OCT.

      Our interpretation of the ORN PC1 linear model is not that 3-OCT-evoked ORN activity is the neural substrate for variability – instead, it is the general responsiveness of an individual’s AL across multiple odors (this is our interpretation of the the uniformly positive loadings in ORN PC1). It is true that OCT and MCH do not activate ORNs as strongly as other odorants – our analysis rests on the loadings of the PCs that capture all odor/glomerulus combinations available in our data. All that said, since a single outlier in Figure 1L dominates the relationship, therefore we have de-emphasized these particular results in our revision.

      This leads to the most significant concern, which is that the paper does not provide strong evidence that odor-specific patterns of glomerular activation in ORNs and PNs underlie individual behavioral preference between different odors (that each drive significant levels of activity, e.g. 3-OCT v. MCH), or that the ORN-PN synapse is a major driver of individual behavioral variability. Lines 26-31 of the abstract are not well supported, and the language should be softened.

      We have modified the abstract to emphasize our confidence in PN calcium correlating with odor-vs-odor preference (removing the ORN & odor-vs-air language).

      Their conclusions come primarily from having correlated many parameters reduced from the ORN and PN response matrices against the behavioral data. Several claims are made that a given PC is predictive of an odor preference while others are not, however it does not appear that the statistical tests to support this are shown in the figures or text.

      For each linear model of calcium dynamics predicting preference, we restricted our analysis to the first 5 principal components. Thus, we do not feel that we correlated many parameters against the behavioral data. As mentioned below, the correlations identified by this approach comfortably survive a conservative Bonferroni correction. In this revision, a linear model with a single predictor – the projection onto PC 2 of PN calcium – is the result we emphasize in the text, and we report R2 between measured and predicted preference for both a training set of flies and for a test set of flies (Figure 1M and Figure 2 – figure supplement 1).

      That is, it appears that the correlation of models based on each component is calculated, then the component with the highest correlation is selected, and a correlation and p-value computed based on that component alone, without a statistical comparison between the predictive values of each component, or to account for effectively performing multiple comparisons. (Figure 1, k l m n o p, Figure 3, d f, and associated analyses).

      To reiterate, this was our process: 1) Collect a training data set of paired Ca++ recordings and behavioral preference scores. 2) Compute the first five PCs of the Ca++ data, and measure the correlation of each to behavior. 3) Identify the PC with the best correlation. 4) Collect a test data set with new experimental recordings. 5) Apply the model identified in step 3. For some downstream analyses, we combined test and training data, but only after confirming the separate significance of the training and test correlations.

      The p-values associated with the PN PC 2 model predicting OCT-MCH preference are sufficiently low in each of the training and testing sets (0.0063 and 0.0069, respectively) to pass a conservative Bonferroni multiple hypothesis correction (one hypothesis for each of the 5 PCs) at an alpha of 0.05.

      Additionally, the statistical model presented in Figure 4 needs significantly more explanation or should be removed- it's unclear how they "infer" the correlation, and the conclusions appears inconsistent with Figure 3 - Figure Supplement 2.

      We have removed Figure 4 and have improved upon our approach of inferring the strength of the correlation between latent calcium and behavior in the Methods, incorporating bootstrapping of all sources of data used for the inference (lines 622-628). At the same time, we now emphasize that this analysis is a bonus of sorts, and that the simple correlation between Ca++ and behavior is the main result.

      Suggestions:

      (1) If the authors want to make the claim that individual variation in ORN or PN odor representations (e.g. glomerular activation patterns) underlie differences in odor preference (MCH v. OCT), they should generalize the weak correlation between ORN/PN activity and behavior to additional glomeruli and pair of odors, where both odors drive significant activity. Otherwise, the claims in the abstract should be tempered.

      We have modified the abstract to focus on the effect we have the highest confidence in: contrasting PN calcium activation of DM2 and DC2 predicting OCT-MCH preference.

      (2) One of the most valuable contributions a study like this could provide is to carefully quantify the amount of measurement variation (across trials, across hemispheres) in neural responses relative to the amount of individual variation (across individuals). Beyond the degree of variation in the amplitude of odor responses, the rank ordering of odor response strength between repeated measurements (to try to establish conditions that account for adaptation, etc.), between hemispheres, and between individuals is important. Establishing this information is foundational to this entire field of study. The authors take a good first step towards this in Figure 1J and Figure 1, supplement 5C, but the plots do not directly show variance, and the comparison is flawed because more comparisons go into the individual-individual crunch (as evidenced by the consistently smaller range of quartiles). The proper way to do this is by resampling.

      We do not know what the reviewer means by “individual-individual crunch,” unfortunately. Thus, it is difficult to determine why they think the analysis is flawed. We are also uncertain about the role of resampling in this analysis. The medians, interquartile ranges and whiskers in the panels referenced by the reviewer are not confidence intervals as might be determined by bootstrap resampling. Rather, these are direct statistics on the coding distances as measured – the raw values associated with these plots are visualized in Figure 1H.

      In our revision we updated the heatmaps in Figure 1 – figure supplement 3 to include recordings across the lobes and trials of each individual fly, and we have added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials, with associated rank-order correlation coefficients. Since the focus of this study was whether measured individual differences predict individual behavioral preference, a full characterization of the statistics of variation in calcium responses was not the focus, though it was the focus of a previous study (Honegger & Smith et al., 2019).

      To help the reader understand the data, we would encourage displaying data prior to dimensionality reduction - why not show direct plots of the mean and variance of the neural responses in each glomerulus across repeats, hemispheres, individuals?

      We added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials.

      A careful analysis of this point would allow the authors to support their currently unfounded assertion that odor responses become more "idiosyncratic" farther from the periphery (line 135-36); presumably they mean beyond just noise introduced by synaptic transmission, e.g. "idiosyncrasy" is reproducible within an individual. This is a strong statement that is not well-supported at present - it requires showing the degree of similarity in the representation between hemispheres is more similar within a fly than between flies in PNs compared to ORNs (see Hige... Turner, 2015).

      Here are the lines in question: “PN responses were more variable within flies, as measured across the left and right hemisphere ALs, compared to ORN responses (Figure 1 – figure supplement 5C), consistent with the hypothesis that odor representations become more idiosyncratic farther from the sensory periphery.”

      That responses are more idiosyncratic farther from the periphery is therefore not an “unfounded assertion.” It is clearly laid out as a hypothesis for which we can assess consistency in the data. We stand by our original interpretation: that several observations are consistent with this finding, including greater distance in coding space in PNs compared to ORNs, particularly across lobes and across flies. In addition, higher accuracy in decoding individual identity from PN responses compared to ORN responses (now appearing as Figure 1 – figure supplement 6A) is also consistent with this hypothesis.

      Still, to make confusion at this sentence less likely, we have reworded it as “suggesting that odor representations become more divergent farther from the sensory periphery.” (lines 139-140)

      (3) Figure 3 is difficult to interpret. Again, the variability of the measurement itself within and across individuals is not established up front. Expression of exogenous tagged brp in ORNs is also not guaranteed to reflect endogenous brp levels, so there is an additional assumption at that level.

      Figure 3 – figure supplement 1 Panels A-C display the variability of measurements (Brp volume, total fluorescence and fluorescence density) both within (left/right lobes) and across individuals (the different data points). We agree that exogenous tagged Brp levels will not be identical to endogenous levels. The relationship appears significant despite this caveat.

      Again there are statistical concerns with the correlations. For instance, the claim that "Higher Brp in DM2 predicted stronger MCH preference... " on line 389 is not statistically supported with p<0.05 in the ms (see Figure 3 G as the closest test, but even that is a test of the difference of DM2 and DC2, not DM2 alone).

      We have changed the language to focus on the pattern of the loadings in PC 2 of Brp-Short density and replaced “predict.” (lines 366-369).

      Can the authors also discuss what additional information is gained from the expansion microscopy in the figure supplement, and how it compares to brp density in DC2 using conventional methods?

      The expansion microscopy analysis was an attempt to determine what specific aspect of Brp expression was predictive of behavior, on the level of individual Brp puncta, as a finer look compared to the glomerulus-wide fluorescence signal in the conventional microscopy approach. Since this method did not yield a large sample size, at best we can say it provided evidence consistent with the observation from confocal imaging that Brp fluorescent density was the best measure in terms of predicting behavior.

      I would prefer to see the calcium and behavioral datasets strengthened to better establish the relationship between ORN/PN responses and behavior, and to set aside the anatomical dataset for a future work that investigates mechanisms.

      We are satisfied that our revisions put appropriate emphasis on a robust result relating calcium and behavior measurements: the relationship between OCT-MCH preference and idiosyncratic PN calcium responses. Finding that idiosyncratic Brp density has similar PC 2 loadings that also significantly predict behavior is an important finding that increases confidence in the calcium-behavior finding. We agree with the reviewer that these anatomical findings are secondary to the calcium-behavior analyses, but think they warrant a place in the main findings of the study. As the reviewer suggests, we are conducting follow-on studies that focus on the relationship between neuroanatomical measures and odor preference.

      (4) The mean imputation of missing data may have an effect on the conclusions that it is possible to draw from this dataset. In particular, as shown in Figure 1, supplemental figure 3, there is a relatively large amount of missing data, which is unevenly distributed across glomeruli and between the cell types recorded from. Strikingly, DC2 is missing in a large fraction of ORN recordings, while it is present in nearly all the PN recordings. Because DC2 is one of the glomeruli implicated in predicting MCH-OCT preference, this lack of data may be particularly likely to effect the evaluation of whether this preference can be predicted from the ORN data. Overall, mean imputation of glomerulus activity prior to PCA will artificially reduce the amount of variance contributed by the glomerulus. It would be useful to see an evaluation of which results of this paper are robust to different treatments of this missing data.

      We confirmed that the linear model of predicted OCT-MCH using PN PC2 calcium was minimally altered when we performed imputation via alternating least squares using the pca function with option ‘als’ to infill missing values on the calcium matrix 1000 times and taking the mean infilled matrix (see MATLAB documentation and Figure 1 – figure supplement 5 of Werkhoven et al., 2021). Fitted slope value for model using mean-infilled data presented in article: -0.0806 (SE = 0.028, model R2 \= 0.15), fitted slope value using ALS-imputed model: -0.0806 (SE 0.026, model R2 \= 0.17).

      Additional comments:

      (1) On line 255 there is an unnecessary condition: "non-negative positive".

      Thank you – non-negative has been removed.

      (2) In Figure 4 and the associated analysis, selection of +/- 20% interval around the observed $R^2$ appears arbitrary. This could be based on the actual confidence interval, or established by bootstrapping.

      We have replaced the +/- 20% rule by bootstrapping the calculation of behavior-behavior R2, calcium-calcium R2, and calcium-behavior R2 and propagating the uncertainties forward (Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) On line 409 the claim is made "These sources of variation specifically implicate the ORN-PN synapse..." While the model recapitulates the glomerulus specific variation of activity under PN synapse density variation, it also occurs under ORN identity variation, which calls into question whether the synapse distribution itself is specifically implicated, or if any variation that is expected to be glomerulus specific would be equally implicated.

      We agree with this observation. We found that varying either the ORNs or the PNs that project to each glomeruli can produce patterns of PN response variation similar to what is measured experimentally. This is consistent with the idea that the ORN-PN synapse is a key site of behaviorally-relevant variation.

      (4) Line 214 "... we conclude that the relative responses of DM2 vs DC2 in PNs largely explains an individual's preference." is too strong of a claim, based on the fact that using the PC2 explains much more of the variance, while using the stated hypothesis noticeable decreases the predictive power ($R^2$ = 0.2 vs $R^2$ = 0.12 )

      We have changed the wording here to “we conclude that the relative responses of DM2 vs DC2 in PNs compactly predict an individual’s preference.” (lines 192-193)

    1. Author response:

      Reviewer #1:

      We thank the reviewer for recognizing the impact of our work on the pivotal roles of N-glycan-dependent ERQC in cellular fitness and pathogenicity and providing valuable comments to be considered to improve the manuscript. As suggested, we will rearrange data, reduce text volume, and discuss the possibility of how ERQC mutation decreases EV secretion without significant defect in conventional secretion. Regarding the proteomics data, we have already initiated a comparative analysis of total intracellular and EV-associated proteins to determine whether the reduced cargo loading in the Ugg1 mutant is specific to EV-associated proteins. Additionally, we may extend the analysis to include total secretion, enabling a clearer comparison between classical secretion and EV-mediated secretion to better evaluate the extent of classical secretion defects in the Ugg1 mutant.

      Reviewer #2:

      We sincerely thank the reviewer for the positive evaluation of our work. As recommended, we will reduce the text and reorganize the data to enhance the manuscript's readability.

      Reviewer #3:

      We sincerely thank the reviewer for the high appreciation of our work. As recommended, we will provide a more detailed explanation of the results with improved interpretation, strongly grounded on the obtained data.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The assertion that membrane trafficking is impaired by this variant could be bolstered by additional data.

      We agree with this comment and will perform additional analysis and experiments to support the assertion that membrane trafficking is impaired. As noted by the Reviewers, standard biochemical approaches to obtain such data may be challenging due to the fact that Kv3.1 is expressed in only a subset of cells and that we do not have a Kv3.1-A421V specific antibody.

      (2) In some experiments details such as the age of the mice or cortical layer are emphasized, but in others, these details are omitted.

      We appreciate that the Reviewer has noted this omission. We will include such details in the resubmission.

      (3) The impairments in PV neuron AP firing are quite large. This could be expected to lead to changes in PV neuron activity outside of the hypersynchronous discharges that could be detected in the 2-photon imaging experiments, however, a lack of an effect on PV neuron activity is only loosely alluded to in the text. A more formal analysis is lacking. An important question in trying to understand mechanisms underlying channelopathies like KCNC1 is how changes in membrane excitability recorded at the whole cell level manifest during ongoing activity in vivo. Thus, the significance of this work would be greatly improved if it could address this question.

      Yes, the impairments in neocortical PV-IN excitability are more marked than any other PV interneuronopathy that we have studied. We will include a more extensive analysis of the 2-photon imaging data in the resubmission. However, there are limitations to the inferences that can be made as to firing patterns based on 2-photon calcium imaging data, particularly for interneurons.

      (4) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice, but there is no mention of littermate control analyzed by EEG.

      We did not observe myoclonic jerks in control mice. This data will be included in the resubmission.

      Reviewer #2 (Public review):

      Weaknesses:

      In some experiments, the age of the animal in each experiment is not clearly stated. For example, the experiments in Figure 2 demonstrate impaired K+ conductance and membrane localization, but it is not clear whether they correlated with the excitability and synaptic defects shown in subsequent figures. Similarly, it is unclear how old mice the authors conducted EEG recordings, and whether non-epileptic mice are younger than those with seizures.

      We will include explicit information as to the age of the animals used for each experiment in the resubmission.

      The trafficking defect of mutant Kv3.1 proposed in this study is based only on the fluorescence density analysis which showed a minor change in membrane/cytosol ratio. It is not very clear how the membrane component was determined (any control staining?). In addition to fluorescence imaging, an addition of biochemical analysis will make the conclusion more convincing (while it might be challenging if the Kv3.1 is expressed only in PV+ cells).

      We will include additional information in the Methods section as to how the membrane component was determined in a revised version of the manuscript. We agree with Reviewer #2 regarding the limitations in the ability to further evaluate this.

      While the study focused on the superficial layer because Kv3.1 is the major channel subunit, the PV+ cells in the deeper cortical layer also express Kv3.1 (Chow et al., 1999) and they may also contribute to the hyperexcitable phenotype via negative effect on Kv3.2; the mutant Kv3.1 may also block membrane trafficking of Kv3.1/Kv3.2 heteromers in the deeper layer PV cells and reduce their excitability. Such an additional effect on Kv3.2, if present, may explain why the heterozygous A421V KI mouse shows a more severe phenotype than the Kv3.1 KO mouse (and why they are more similar to Kv3.2 KO). Analyzing the membrane excitability differences in the deep-layer PV cells may address this possibility.

      We will include recordings from PV-INs in deeper layers of the neocortex in the revised version of the manuscript, as requested.

      In Table 1, the A421V PV+ cells show a depolarized resting membrane potential than WT by ~5 mV which seems a robust change and would influence the circuit excitability. The authors measured firing frequency after adjusting the membrane voltage to -65mV, but are the excitability differences less significant if the resting potential is not adjusted? It is also interesting that such a membrane potential difference is not detected in young adult mice (Table 2). This loss of potential compensation may be important for developmental changes in the circuit excitability. These issues can be more explicitly discussed.

      We will include a more thorough discussion of this finding in the revised version of the manuscript. However, we do not completely understand this finding. It could be compensatory, as suggested by the Reviewer; however, it is transient and seems to be an isolated finding (i.e., there does not appear to be parallel “compensation” in other properties). Alternatively, it could be that impaired excitability of the Kcnc1-A421V/+ PV-INs may reflect impaired/delayed development, which itself is known to be activity-dependent.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript identifies a partial mechanism of disease that leaves several aspects unresolved including the possible role of the observed impairments in thalamic neurons in the seizure mechanism. Similarly, while the authors identify a reduction in potassium currents and a reduction in PV cell surface expression of Kv3.1 it is not clear why these impairments would lead to a more severe disease phenotype than other loss-of-function mutations which have been characterized previously. Lastly, additional analysis of video-EEG data would be helpful for interpreting the extent of the seizure burden and the nature of the seizure types caused by the mutation.

      We agree with this comment. We studied neurons in the reticular thalamus as these cells are known to express Kv3.1 and are linked to epilepty pathogenesis. Yet, we focused on neocortical PV-INs over other Kv3.1-expressing neurons such as neurons of the reticular thalamus because we evaluated the impairments of intrinsic excitability to be more profound in neocortical PV-INs. Cross of Kcnc1-Flox(A421V)/+ mice to a cerebral cortex interneuron-specific driver that would avoid recombination in thalamus – such as Ppp1r2-Cre (RRID:IMSR_JAX:012686) – could assist in determining the relative contribution of thalamic reticular nucleus dysfunction to the overall phenotype, as performed by Makinson et al (2017) to address a similar question. There are of course other Kv3.1-expressing neurons in the brain, including in GABAergic interneurons in hippocampus and amygdala. We will include additional discussion in a revised version of the manuscript as to why we think there is more severe impairment in our Kcnc1-Flox(A421V)/+ mice relative to Kv3.1 and Kv3.2 knockout mice. We will include additional data on the epilepsy phenotype in the revised version of the manuscript, as requested.

    1. Author response:

      We thank the Dr. Ealand and Reviewers for their thoughtful comments on our submitted manuscript. We are in the process of revising our manuscript in light of the comments received, outlined below.

      In addition to the requested revisions, we have new data with M. tuberculosis strain H37Rv +/- gidB deletion (and complementation), confirming that deletion of gidB sensitizes the strain to rifampicin, and extending our findings to pathogenic tuberculosis. This will also be incorporated into the revised manuscript.

      Reviewer #1:

      (1) The structural work at the end feels like both an afterthought in terms of the science and the writing. I would suggest re-writing that section to be clearer about what the figure says and does not say. For example, the caption of Figure 6 appears to be more informative than the text and refers to concepts not present in the main text. In general, I found this section to be the most difficult to understand.

      We are rewriting this section to make it more coherent with the rest of the manuscript.

      (2) "delta-gidB" is written out in the caption of Figure 6. Line 234: gidB not italics.

      Thank you, these changes will be incorporated in the revised manuscript.

      Reviewer #2:

      (1) It would be essential to provide information regarding the growth rate and, ideally, translation rates in the gidB KO and the isogenic WT. As translation balances accuracy and speed, only characterising the speed is not sufficient to understand the phenomenon.

      We are performing these assays and will incorporate them in the revised manuscript.

      (2) Cryo-EM analysis of vacant 70S ribosomes is not sufficient for understanding the mechanisms underlying the accuracy defects in the gidB KO. One should assemble and solve structurally near-cognate and non-cognate complexes. I believe the authors are over-interpreting the scant structural data they have. Furthermore, current representation makes it impossible to assess the resolution of the structure, especially in the areas of interest.

      While we agree with the Reviewer that structures of translating ribosomes will be most informative in elucidating the molecular mechanism(s) by which methylation (or not) by GidB contributes to mistranslation, those experiments are ongoing and beyond the scope of the current study. Unlike E. coli ribosomes, for which there are a plethora of structures for mutants available, there are very structures of mycobacterial ribosomes beyond wild-type apo ribosomes. Therefore we feel that the structures of apo mycobacterial ribosomes +/- GidB-mediated methylation are still of value, and a necessary “first step” for the mechanistic work alluded to above. Secondly, the apo ribosome structures still hint at potential mechanisms by which mistranslation and 16S rRNA methylation may impact on each other – as in the comments to R#1 above, we are revising the text to increase clarity and coherence of this section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors follow up on their published observation that providing a lower glucose parental nutrition (PN) reduces sepsis from a common pathogen [Staphylococcus epidermitis (SE)] in preterm piglets. Here they found that a higher dose of glucose could thread the needle and get the protective effects of low glucose without incurring significant hypoglycemia. They then investigate whether the change in low glucose PN impacts metabolism to confer this benefit. The finding that lower glucose reduces sepsis is important as sepsis is a major cause of morbidity and mortality in preterm infants, and adjusting PN composition is a feasible intervention.

      Strengths:

      (1) They address a highly significant problem of neonatal sepsis in preterm infants using a preterm piglet model.

      (2) They have compelling data in this paper (and in a previous publication, ref 27) that low glucose PN confers a survival advantage. A downside of the low glucose PN is hypoglycemia which they mitigate in this paper by using a slightly high amount of glucose in the PN.

      (3) The experiment where they change PN from high to low glucose after infection is very important to determine if this approach might be used clinically. Unfortunately, this did not show an ability to reduce sepsis risk with this approach. Perhaps this is due to the much lower mortality in the high glucose group (~20% vs 87% in the first figure).

      (4) They produce an impressive multiomics data set from this model of preterm piglet sepsis which is likely to provide additional insights into the pathogenesis of preterm neonatal sepsis.

      Weaknesses:

      (1) The high glucose control gives very high blood glucose levels (Figure 1C). Is this the best control for typical PN and glucose control in preterm neonates? Is the finding that low glucose is protective or high glucose is a risk factor for sepsis?

      This work is a follow-up from our previous work where we explored different PN glucose regimens. Taken together our experiments heavily imply that glucose provision is associated to severity in a seemingly linear manner. In the clinical setting, there is no fixed glucose provision, but guidelines specify ranges that are acceptable. However, these guidelines do not take possible infections into account and are designed to optimize growth outcomes. Increased provision of glucose to preterm neonates may therefore increase their infection risk, but parenteral glucose cannot be entirely avoided as it would lead to hypoglycaemia and associated brain damage. In the present paper the reduced glucose PN reflects the lowest end of the recommended PN glucose intake. More work is needed to figure out the best glucose provision to infected preterm newborns, balancing positive and negative factors.

      (2) In Figure 1B, preterm piglets provided the high glucose PN have 13% survival while preterm piglets on the same nutrition in Figure 6B have ~80% survival. Were the conditions indeed the same? If so, this indicates a large amount of variation in the outcome of this model from experiment to experiment.

      In the follow-up experiment outlined in Figure 6 we reduced the follow-up time to 12 hours in an effort to minimize the suffering of the animals. We did this because we could detect relevant differences in the immune response between High and low glucose infected pigs as 12 hours. If we had extended the follow-up experiment to 22 hours we would likely have seen a much increased mortality.

      (3) Piglets on the low glucose PN had consistently lower density of SE (~1 log) across all time points. This may be due to changes in immune response leading to better clearance or it could be due to slower growth in a lower glucose environment.

      We agree with this assessment and have adjusted our result section to reflect this.

      (4) Many differences in the different omics (transcriptomics, metabolomics, proteomics) were identified in the SE-LOW vs SE-HIGH comparison. Since the bacterial load is very different between these conditions, could the changes be due to bacterial load rather than metabolic reprogramming from the low glucose PN?

      We analyzed the relationship between bacterial burdens and mortality and found that it did not correlate within each of the treatment groups. We have now added this data to the results section as supplemental and report this fact in the section called “Reduced glucose supply increases hepatic OXPHOS and gluconeogenesis and attenuates inflammatory pathways”. This finding inspired us to further explore the relationship between bacterial burdens and infection responses in our model which has resulted in our recent preprint: Wu et at. Regulation of host metabolism and defense strategies to survive neonatal infection. BioRxiv 2024.02.23.581534; doi: https://doi.org/10.1101/2024.02.23.581534

      Reviewer #2 (Public Review):

      Summary:

      The authors demonstrate that a low parenteral glucose regimen can lead to improved bacterial clearance and survival from Staph epi sepsis in newborn pigs without inducing hypoglycemia, as compared to a high glucose regimen. Using RNA-seq, metabolomic, and proteomic data, the authors conclude that this is primarily mediated by altered hepatic metabolism.

      Strengths:

      Well-defined controls for every time point, with multiple time points and biological replicates. The authors used different experimental strategies to arrive at the same conclusion, which lends credibility to their findings. The authors have published the negative findings associated with their study, including the inability to reverse sepsis-related mortality after switching from SE-high to SE-low at 3h or 6h and after administration of hIAIP.

      Weaknesses:

      (1) The authors mention, and it is well-known, that Staph epi is primarily involved in late-onset sepsis. The model of S. epi sepsis used in this study clearly replicates early-onset sepsis, but S. epi is extremely rare in this time period. How do the authors justify the clinical relevance of this model?

      The distinction between early and late onset sepsis makes sense clinically because they are likely to be caused by different organisms and therefore require different empirical antibiotic regimes. Early onset sepsis is caused by organisms transferred perinatally often following chorioamnionitis or uro-gential maternal infections (Strep. agalacticae/E. coli) whereas Late onset sepsis is likely caused by organisms from indwelling catheters or mucosal surfaces, most often coagulase negative staphylococci. Timing of an infection after birth of course plays a role, but the virulence factors of the pathogen probably plays a large role in shaping the immune response. Therefore, even though the infection in our model is initiated on the first day after birth, the organism that we use, Staph epidermidids, makes it a better model for pathogenesis of late onset sepsis. However, it is also important to acknowledge that the pathophysiology of “sepsis” may be similar despite timing and pathogen and depends on the degree of immune activation and downstream effects on organs.

      (2) The authors find that the neutrophil subset of the leukocyte population is diminished significantly in the SE-low and SE-high populations. However, they conclude on page 10 that "modulations of hepatic, but not circulating immune cell metabolism, by reduced glucose supply..." and this is possible because the authors have looked at the entire leukocyte transcriptome. I am curious about why the authors did not sequence the neutrophil-specific transcriptome.

      We collected the whole blood transcript during the experiments, which reflect the transcription profile of all the circulating leucocytes. Since we did not do single cell RNA sequencing during the experiment there is no possibility of isolating the neutrophil transcriptome at this time. Your point however is valid and we will reconsider incorporating single cell transcriptomics in future experiments.

      (3) The authors use high (30g/k/d) and low (7.2g/k/d) glucose regimens. These translate into a GIR of 21 and 5 mg/k/min respectively. A normal GIR for a preterm infant is usually 5-8, and sometimes up to 10. Do the authors have a "safe GIR" or a threshold they think we cannot cross? Maybe a point where the metabolism switch takes place? They do not comment on this, especially as GIR and glucose levels are continuous variables and not categorical.

      Our reduced glucose PN was chosen as it corresponded with the low end of recommended guidelines for PN glucose intake. There likely is not a “safe GIR” as the clinical responses to glucose intake during infections do not seem binary but increase with glucose intake. It is also important to remember that the reduced glucose intervention still resulted in significant morbidity and a 25% mortality within 22 hours. There is therefore still vast room for improvement, but even though further reduction in PN glucose would probably provide further protection it would entail dangerous hypoglycaemia (as described in our previous paper). The findings in this current paper has prompted us to explore several strategies to replace parenteral glucose with alternative macronutrients. Thus, the optimal PN for infected newborns would probably differ from standard PN in all macronutrients and will require much more pre- and clinical research.

      (4) In Figures 2B and C the authors show that SE-high and SE-low animals have differences in the oxphos, TCA, and glycolytic pathways. The authors themselves comment in the Supplementary Table S1B, E-F that these same metabolic pathways are also different in the Con-Low and Con-high animals, it is just the inflammatory pathways that are not different in the non-infected animals. How can they then justify that it is these metabolic pathways specifically which lead to altered inflammatory pathways, and not just the presence of infection along with some other unfound mechanism?

      It is to be expected that the inflammatory pathways do not differ between the Con-Low and Con-High groups as there is no infection to induce these pathways. The identified metabolic pathways that differ between SE-High and SE-Low animals seem to us the best explanation of the differences in clinical phenotype.

      (5) The authors mention in Figure 1F that SE-low animals had lower bacterial burdens than SE-high animals, but then go on to infer that the inflammatory cytokine differences are attributed to a rewiring of the immune response. However, they have not normalized the cytokine levels to the bacterial loads, as the differences in the cytokines might be attributed purely to a difference in bacterial proliferation/clearing.

      Please see our response to reviewer #1

      (6) The authors mention that switching from SE-high to SE-low at 3 or 6 h time points does not reduce mortality. Have the authors considered the reverse? Does hyperglycemia after euglycemia initially, worsen mortality? That would really conclude that there is some metabolic reprogramming happening at the very onset of sepsis and it is a lost battle after that.

      A very good point that we have not explored yet, we have added this consideration to the discussion and slightly amended our conclusions of this follow-up experiment.

      Reviewer #3 (Public Review):

      Summary:

      Baek and colleagues present important follow-up work on the role of serum glucose in the management of neonatal sepsis. The authors previously showed high glucose administration exacerbated neonatal sepsis, while strict glucose control improved outcomes but caused hypoglycemia. In the current report they examined the effect of a more tailored glucose management approach on outcomes and examined hepatic gene expression, plasma metabolome/proteome, blood transcriptome, as well as the the therapeutic impact of hIAIP. The authors leverage multiple powerful approaches to provide robust descriptive accounts of the physiologic changes that occur with this model of sepsis in these various conditions. Strengths:

      (1) Use of preterm piglet model.

      (2) Robust, multi-pronged approach to address both hepatic and systemic implications of sepsis and glucose management.

      (3) Trial of therapeutic intervention - glucose management (Figure 6), hIAIP (Figure 7).

      Weaknesses:

      (1) The translational role of the model is in question. CONS is rarely if ever a cause of EOS in preterm neonates. The model. uses preterm pigs exposed at 2 hours of age. This model most likely replicates EOS.

      Please see our response to Reviewer #2

      (2) Throughout the manuscript it is difficult to tell from which animals the data are derived. Given the ~90% mortality in the experimental CONS group, and 25% mortality in the intervention group, how are the data from animals "at euthanasia" considered? Meaning - are data from survivors and those euthanized grouped together? This should be clarified as biologically these may be very different populations (ie, natural survivor vs death).

      This is a very valid point. For all endpoints that are analyzed “at euthanasia” the age of the animal will vary. Some will have been euthanized early due to clinical deterioration and some will have survived all the way to the end of the experiment. This needs to be kept in mind when interpreting the results. We have further highlighted this point in the discussion and made it clear to the reader at what time-point each analysis was performed.

      (3) With limited time points (at euthanasia ) for hepatic transcriptomics (Figure 2), plasma metabolite (Figure 3) blood transcriptome (Figure 4), and plasma proteome (Figure 5) it is difficult to make conclusions regarding mechanisms preceding euthanasia. Per methods, animals were euthanized with acidosis or clinical decompensation. Are the reported findings demonstrative of end-organ failure and deterioration leading to death, or reflective of events prior?

      Yes, all organ specific endpoints are snapshots of the state of the animals at the time of euthanasia, pooling together animals that succumbed to sepsis and those that survived to 22 hours post infection. These results therefore reflect the end-state of the infection we cannot be sure when the differences between groups manifested themselves. However, given the stark differences in plasma lactate at 12 hours post infection it is likely that changes to metabolism occurred before most of animals succumbed to sepsis.

      We agree this is a weakness in our model, but we have since published a pre-print where we have further explored how metabolic adaptations shape the fate of similarly infected preterm pigs: BioRxiv 2024.02.23.581534; doi: https://doi.org/10.1101/2024.02.23.581534

      (4) Data are descriptive without corresponding "omics" from interventions (glucose management and/or hIAIP) or at least targeted assessment of key differences.

      We only did in-depth analysis of the glucose intervention as this showed the most promising clinical effects that warranted further in-depth investigation. It is possible that further insights could be gained from in-depth analysis of the other interventions but given that there were no obvious clinical befits we refrained from that.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I am intrigued that mortality was not correlated to bacterial burden. Please provide the "data not shown" as this would help the reader understand better whether the difference in bacterial burden is driving the phenotypes and findings of the low glucose group.

      We have added this data to supplementary figure 1.  

      Reviewer #2 (Recommendations For The Authors):

      (1) I would urge the authors to consider a neutrophil-specific transcriptomic analysis. I understand that this would add significantly to the resubmission process. If the authors wish to include that as a future direction instead, they need to specifically mention the limitations of whole blood transcriptomics and how different immune cell types react differently to bacterial antigens.

      We agree with your considerations but we cannot include that data using the whole blood method applied in the experiment. We have added your consideration to the discussions.

      (2) I urge the authors to remove any impression that this is a model of late-onset sepsis, which is implied from the introduction, lines 3 and 4.

      Our intention was not to directly suggest that our model is a perfect reflection of late-onset sepsis but rather to highlight the relevance of using a pathogen commonly associated with LOS. We believe our model primarily captures the effects of intense pro-inflammatory immune activation, which may have parallels with various forms of sepsis, including LOS.

      Reviewer #3 (Recommendations For The Authors):

      Drawing on the robust nature of your "omics", identify key measures and test whether they are altered earlier in the development of clinical sepsis. Test whether these are altered by the intervention.

      A very valid point, at the moment it is not possible for us to explore this within the confines of these experiments. But, building upon these findings and the ones in our recent preprint we are confident that shifts in hepatic ratio of Oxidative phosphorylation and gluconeogenesis vs glycolysis shape the immune response to infections in neonates. In our upcoming experiments we are planning to incorporate plasma metabolomics at earlier timepoints to monitor when shifts in metabolism occur. However, given the heterogeneity of pigs, as opposed to inbred rodent models, sacrificing animals at fixed timepoints to gauge their organ function will be hard to interpret as it is impossible to know what the end state of the particular animal would have been. Therefore longitudinal sampling of liver tissue, during the course of infection would be challenging.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloy et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      In our framework, we posit that the amount of drift has been shaped by evolution to maximize fitness in the environments that the population has experienced, and this drift is observed independent of environment. While we agree that exploring the role of changing environments on the measure of drift would be interesting, we would anticipate the effects may be nuanced and beyond the scope of the current paper (and the scope of our theoretical work, which assumes that the individual phenotype is unaffected by change of environment except as mediated by death due to fitness effects). For example, it would be difficult to differentiate drift from idiosyncratic differences in learning (Smith et al., 2022), and non-adaptive plasticity to unrelated cues has been posited as a method of producing diverse phenotypes (Maxwell and Magwene, 2017), so “learning” to uncorrelated stimuli could conceivably be a mechanism for drift. Given the scope of the current study, we prioritized eliminating potential confounds for measuring drift, but remain interested in the interaction between learning and drift.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We agree it would be helpful to have more description of the dynamics over time aside from the power spectrum and autoregressive model fits. We hope to address this in more detail to provide more description of the changes over time in a revision.

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      As in our response to point 1, we believe this is a crucial distinction, and we intend to further highlight it in the discussion in the revision and further expand our discussion of how the two strategies may interact.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We agree that our data do not support a strong conclusion that serotonin plays a privileged role in regulating drift. Based on previous literature (e.g. Kain et al., 2014, where identical pharmacological manipulations had an effect on variability while dopaminergic and octopaminergic manipulations did not), we think it likely that large global perturbations in serotonin that we observe are likely to influence plasticity that might be involved in drift (and thus find the results we observe not particularly surprising). Nonetheless, we agree that the mechanism by which serotonin may affect drift could be indirect, and it is similarly plausible that many global perturbations could lead to some shift in the amount of drift. We intend to further discuss these issues in the revision.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      While a cursory inspection suggests that batch effects between different replicates was small, we intend to clarify this and more explicitly address the effects of replicates in revision.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We intend to address this in a revision of the discussion.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We will further clarify the assumptions of the model in revision.

  3. Nov 2024
    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strength:

      Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.

      Weakness:

      (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.

      Indeed, we are aware of and have carefully studied the literature of Keller et al. Our manuscript here presents novel experiments beyond the scopes of that paper. Thanks to this comment, we will substantially revise our manuscript to enhance the visibility of novel data while keeping the agreeing data in the supplementary.

      (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?

      We agree with this point. We meant that the EUS’ phasic bursting pattern was rapidly stopped upon BarEsr1 photoinhibition, but not all the firing stopped instantaneously. According to the previous studies (Chang et al., 2007, de Groat, 2009, de Groat and Yoshimura, 2015, Kadekawa et al., 2016), the voiding physiology of rodents is probably different from that of humans, such that for rodents the urine is step-wise pumped out in the gap time between multiple consecutive EUS phasic bursting epochs, and for humans the urine is continuously pumped out once the EUS firing is almost fully inhibition during a period of time. Namely, for mice, the EUS display sustained tonic activity following phasic bursting, while, in contrast, for humans the EUS keeps tonic firing until the moment of voiding onset (complete inhibition, muscle relaxed). Despite the prominent differences in the basic physiological properties, our assumption is that the logic of circuits from the brainstem to the urethra in this pathway is evolutionally conserved for both species; thus the logic of brainstem coordination of voiding could also be the same for both species, which is the main interest of our study (of using an animal model to address concerns of human health). Thus, to interpret our data for a broader audience we made a simplified and inaccurate expression. We apologize for the inaccuracy and we will correct our previous inaccurate description in the revised manuscript.

      (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.

      We agree that current evidence is insufficient to support the current claim. To address this concern and strengthen our claim, we will repeat the retrograde viral tracing experiments, combined with CTB647 injections to label the injection site, to validate specific targeting of SPN or DGC populations. We will also add higher-magnification imaging to distinguish BarESR1 axonal projections targeting SPN versus DGC. Results from these ongoing experiments will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

      In response to the reviewer’s comments, we will attempt perform the following revisions for this round:

      (1) Engagement of ESR1+ neurons in natural urination events:

      We agree that probably not all ESR1+ neurons are consistently engaged during urination. To address this, we will perform a detailed analysis of the opto-tagged single unit recordings data.

      (2) Response dynamics of single- and doubly-projecting neurons:

      (a) We will use retrograde labelling combined with Ca2+ photometry recordings to differentiate the response dynamics of SPN- and DGC-projecting neurons during urination.

      (b) We will perform functional validations to assess the specific roles of single- and doubly-projecting neurons in coordinating bladder and EUS activity.

      (3) Coordination with CRH+ neurons in the PMC:<br /> We appreciate the suggestion to include CRH+ neurons in our model. We will expand our model to incorporate CRH+ neurons and their potential interactions with ESR1+ neurons.

      (4) Sensory modulation of ESR1+ neurons:<br /> The reviewer raises an excellent point regarding sensory input modulation of ESR1+ neuron activity. Although this is beyond the scope of our current study, we recognize its importance and propose to include this as a future direction.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      Weaknesses:

      (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMCESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?

      Thank you for the suggestion, the pudendal nerve transection in awake mice is indeed a challenging experiment that has been missed. We will try it for the revision.

      (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?

      Thank you for this great comment. The projection-specific neuronal function analysis is, as also suggested by Reviewer 2 in a similar comment (#8), missing in our first submission. These are so challenging experiments that we have missed in the first round of tests, but we decide to pursuit this goal again. Namely, we will perform photometry recordings of PMC neurons projecting to the DGC/SPN during measuring bladder pressure and urethral sphincter EMG activity. Additionally, while our study does not include direct tracing data to confirm distinct spinal nuclei for bladder and sphincter innervation, this has been well-documented in classic literature (Yao et al., 2018, Karnup and De Groat, 2020, Karnup, 2021). Specifically, anatomical studies have shown that SPN primarily innervates the bladder, while the DGC is associated with the innervation of the urethral sphincter. We will cite these references to provide context and support for our interpretations.

      (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMCESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.

      While our study primarily focuses on the functional role of PMCESR1+ neurons in bladder-sphincter coordination, we acknowledge that understanding their intrinsic electrophysiological characteristics could further strengthen our findings. However, this aspect falls beyond the scope of the current study. Nevertheless, we recognize the significance of this direction and are excited to pursue it in future research. We appreciate the reviewer’s suggestion, as it highlights an important avenue for expanding upon our current findings.

      (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?

      We sincerely thank the reviewer for raising these important points and for the opportunity to clarify our experimental design and data analysis methods.

      Photoactivation versus photoinhibition parameters: The differences in photoactivation (25 Hz, 15 ms pulses) and photoinhibition (50 Hz, 20 ms pulses) protocols are based on the distinct physiological and technical requirements for activating versus inhibiting PMCESR1+ neurons. For photoactivation, 25 Hz stimulation aligns with the natural firing patterns of central neurons, allowing for intermittent activation without exceeding the neuronal refractory period. The shorter pulse duration (15 ms) minimizes phototoxicity and avoids overstimulation, as performed in previous studies (Keller et al., 2018). In contrast, photoinhibition requires sustained suppression of neuronal activity, achieved through higher frequencies (50 Hz) and longer pulses (20 ms) to ensure continuous coverage of neuronal activity.

      Calculation of pressure changes (ΔP) for photoactivation and photoinhibition: The differing methods for calculating pressure changes reflect the distinct physiological effects we aimed to capture. In photoactivation experiments (ΔP = P5 sec - P0 sec), the pressures before (P0 sec) and 5 seconds after (P5 sec) light delivery were compared to capture the immediate effect of light activation on bladder pressure, focusing on the onset and early dynamics of activation. In contrast, photoinhibition experiments assessed the immediate impact of light-induced suppression on bladder pressure during an ongoing voiding event. Here, Δpressure was calculated as Ppeak – Pmin to measure the rapid drop in pressure directly attributable to neuronal inhibition.

      We will expand these details in the methods section of the revised manuscript to provide greater transparency.

      (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.

      We fully agree with this point. Additionally, in response to your and other reviewers’ suggestions, we are preparing a new round of experiments with projection-specific recording, and thus our discussion and conclusion will also be updated according to the newly obtained data.

      (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.

      Thank you for mentioning the possibility of bladder pressure measurement delay. We would prefer to perform a physical control test to quantify how much delay this measurement is under our experimental conditions. We will use a small ballon to mimic the bladder and use two identical pressure sensors, one with a very short tube inserted into the ballon and one with an extended tube same as in our animal experiments. We will then mimic both contraction initiation and halting, and quantify the delay between the two sensors.

      References

      • Chang HY, Cheng CL, Chen JJJ, de Groat WC. 2007. Serotonergic drugs and spinal cord transections indicate that different spinal circuits are involved in external urethral sphincter activity in rats. American Journal of Physiology-Renal Physiology 292: F1044-F1053. DOI: 10.1152/ajprenal.00175.2006

      • de Groat WC. 2009. Integrative control of the lower urinary tract: preclinical perspective. British Journal of Pharmacology 147. DOI: 10.1038/sj.bjp.0706604

      • de Groat WC, Yoshimura N. 2015. Anatomy and physiology of the lower urinary tract. Handb Clin Neurol 130: 61-108. DOI: 10.1016/B978-0-444-63247-0.00005-5

      • Kadekawa K, Yoshimura N, Majima T, Wada N, Shimizu T, Birder LA, Kanai AJ, de Groat WC, Sugaya K, Yoshiyama M. 2016. Characterization of bladder and external urethral activity in mice with or without spinal cord injury—a comparison study with rats. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 310: R752-R758. DOI: 10.1152/ajpregu.00450.2015

      • Karnup S. 2021. Spinal interneurons of the lower urinary tract circuits. Autonomic Neuroscience 235. DOI: 10.1016/j.autneu.2021.102861

      • Karnup SV, De Groat WC. 2020. Mapping of spinal interneurons involved in regulation of the lower urinary tract in juvenile male rats. IBRO Rep 9: 115-131. DOI: 10.1016/j.ibror.2020.07.002

      • Keller JA, Chen J, Simpson S, Wang EH-J, Lilascharoen V, George O, Lim BK, Stowers L. 2018. Voluntary urination control by brainstem neurons that relax the urethral sphincter. Nature Neuroscience 21: 1229-1238. DOI: 10.1038/s41593-018-0204-3             

      • Yao J, Zhang Q, Liao X, Li Q, Liang S, Li X, Zhang Y, Li X, Wang H, Qin H, Wang M, Li J, Zhang J, He W, Zhang W, Li T, Xu F, Gong H, Jia H, Xu X, Yan J, Chen X. 2018. A corticopontine circuit for initiation of urination. Nature Neuroscience 21: 1541-1550. DOI: 10.1038/s41593-018-0256-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study aimed to better understand the role of the H3 protein of the Monkeypox virus (MPXV) in host cell adhesion, identifying a crucial α-helical domain for interaction with heparan sulfate (HS). Using a combination of advanced computational simulations and experimental validations, the authors discovered that this domain is essential for viral adhesion and potentially a new target for developing antiviral therapies.

      Strengths:

      The study's main strengths include the use of cutting-edge computational tools such as AlphaFold2 and molecular dynamics simulations, combined with robust experimental techniques like single-molecule force spectroscopy and flow cytometry. These methods provided a detailed and reliable view of the interactions between the H3 protein and HS. The study also highlighted the importance of the α-helical domain's electric charge and the influence of the Mg(II) ion in stabilizing this interaction. The work's impact on the field is significant, offering new perspectives for developing antiviral treatments for MPXV and potentially other viruses with similar adhesion mechanisms. The provided methods and data are highly useful for researchers working with viral proteins and protein-polysaccharide interactions, offering a solid foundation for future investigations and therapeutic innovations.

      Weaknesses:

      However, some limitations are notable. Despite the robust use of computational methodologies, the limitations of this approach are not discussed, such as potential sources of error, standard deviation rates, and known controls for the H3 protein to justify the claims. Additionally, validations with methodologies like X-ray crystallography would further benefit the visualization of the H3 and HS interaction.

      Thank you very much for the evaluation and appreciation of our work. In response to the identified weakness, we have conducted additional analyses to further assess the limitations of the computational methodologies used. Specifically, we predicted the MPXV H3 structure using two other AI-based protein structure prediction models, ESMFold and RoseTTAFold2. Both models also predicted an a-helical structure, which supports our conclusion. However, they yielded lower pLDDT scores (Figure S1A-C in the revised SI), indicating that some error may be present.

      We agree with this reviewer, as well as the other reviewers, that X-ray crystallography data for the H3 structure would be highly valuable. Unfortunately, we lack the expertise in structural biology to obtain these results at this stage. To complement this, we performed molecular dynamics (MD) simulations, which suggest that the helical domain is connected to the main domain via a flexible linker. This flexibility may help explain the challenges in obtaining a high-resolution X-ray structure. In fact, to date, the only structural data available for H3 is from the VAVC, which excludes the helical domain (The helical domain part is cleaved for the X-ray studies). We have added this point to the discussion and hope that experts in structural biology will be able to resolve the structure of this domain in the future.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript presenting the discovery of a heparan-sulfate (HS) binding domain in monkeypox virus (MPXV) H3 protein as a new anti-poxviral drug target, presented by Bin Zhen and co-workers, is of interest, given that it offers a potentially broad antiviral substance to be used against poxviruses. Using new computational biology techniques, the authors identified a new alpha-helical domain in the H3 protein, which interacts with cell surface HS, and this domain seems to be crucial for H3-HS interaction. Given that this domain is conserved across orthopoxviruses, authors designed protein inhibitors. One of these inhibitors, AI-PoxBlock723, effectively disrupted the H3-HS interaction and inhibited infection with Monkeypox virus and Vaccinia virus. The presented data should be of interest to a diverse audience, given the possibility of an effective anti-poxviral drug.

      Strengths:

      In my opinion, the experiments done in this work were well-planned and executed. The authors put together several computational methods, to design poxvirus inhibitor molecules, and then they test these molecules for infection inhibition.

      Weaknesses:

      One thing that could be improved, is the presentation of results, to make them more easily understandable to readers, who may not be experts in protein modeling programs. For example, figures should be self-explanatory and understood on their own, without the need to revise text. Therefore, the figure legend should be more informative as to how the experiments were done.

      Thank you very much for your appreciation of our work and your support. In response to the identified weakness, we have carefully reviewed all the figure legends to ensure they are more informative.

      Reviewer #3 (Public Review):

      Summary:

      The article is an interesting approach to determining the MPOX receptor using "in silico" tools. The results show the presence of two regions of the H3 protein with a high probability of being involved in the interaction with the HS cell receptor. However, the α-helical region seems to be the most probable, since modifications in this region affect the virus binding to the HS receptor.

      Strengths:

      In my opinion, it is an informative article with interesting results, generated by a combination of "in silico" and wet science to test the theoretical results. This is a strong point of the article.

      Weaknesses:

      Has a crystal structure of the H3 protein been reported?

      The following text is in line 104: "which may represent a novel binding site for HS". It is unclear whether this means this "new binding site" is an alternative site to an old one or whether it is the true binding site that had not been previously elucidated.

      Thank you very much for your thoughtful evaluation and appreciation of our work.

      We agree with this reviewer, as well as the other reviewers, that X-ray crystallography data for the H3 structure would be highly valuable. Unfortunately, we are not experts in structural biology, and we have not yet been able to obtain these structural results. To date, the only structure available for H3 is the one from VAVC, which does not include the helical domain. We have included this point in the discussion and hope that experts in structural biology will be able to resolve the structure of this domain in the future.

      Regarding the "novel binding site," this term refers to "the true binding site that had not been previously elucidated." Previous research identified that H3 binds to heparan sulfate (HS), but the exact binding site had not been determined.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Validation of Results with Other Experimental Methods: While single-molecule force spectroscopy and flow cytometry provide valuable data, including complementary methods such as X-ray crystallography could offer additional insights into the H3-HS interaction and the effectiveness of the inhibitors.

      Discussion of Computational Model Limitations: Although the use of AlphaFold2 and other advanced tools is a strength, it is important to discuss the limitations of these models in more detail, including potential sources of error and how they may impact the interpretation of the results.

      During the manuscript evaluation, it is not clear the protein localization (transmembrane?) since the protein`s end is very close to the virus membrane surface. All experiments demonstrated the protein without being anchored to the membrane, letting the interaction site always be exposed. If the protein is linked to the membrane, how would the site be exposed due to the limited space between it and the virus structure?

      Thank you for these insightful comments. As you pointed out, the H3 protein, particularly the helical domain at the C-terminal, is indeed located close to the membrane, which could limit the available space for H3 binding. To investigate this further, we modeled the full-length H3 protein in the context of the membrane and performed molecular dynamics (MD) simulations to assess the available space. Our results show that there is more than 1 nm of space between the helical domain and the membrane, which should be sufficient for potential heparan sulfate (HS) binding (see Figure 1E, and Figure S1D&E in the revised manuscript).

      Minor corrections:

      Line 31: "is an emerging zoonotic pathogen" should be revised to reflect that Mpox is a re-emerging virus, given its history of causing outbreaks, such as in 2003.

      Line 71 and Line 75: Adding an explanation of "Mg binding sites" and "GAG motifs" would enhance reader understanding, as these represent important points in the study. The current positioning of Figure 1 causes some confusion for the reader.

      Line 111: High score? What controls were used for the protein? Are there known inhibitors of H3? If so, why weren't they tested for structure comparison? Additionally, what about other molecules that H3 binds to, such as UDP-Glucose, as demonstrated in the base article for the Vaccinia virus H3 protein available in the PDB?

      Figure 2B: Improve the legend, as the colors of the lines are not clear.

      Thank you for your instructive comments. We have addressed most of them in the revised manuscript.

      Regarding the "high score," AlphaFold2 provides a confidence score for its protein structure predictions, with a maximum score of 100. A score above 80 indicates a high level of confidence in the prediction.

      There are known inhibitors (such as antibodies) of H3, and while the sequence is available, no structure has been reported so far. Previous s NMR titration measurements have shown that UDP-glucose binds to H3, but no structural data for the complex exist. To date, the only available crystal structure is of a truncated H3, which does not include the helical domain we identified from VAVC.

      Reviewer #2 (Recommendations For The Authors):

      The text described in the result section does not match the text presented in Figures. So, it is not easy to see what are the authors referring to when they mention the Figure. For example, the text referring to Figure S8 mentions the GB1 domain and the Cohesin module, but these are not mentioned in Figure S8.

      I do not understand the results presented in Figure 5B. It is not clear to me, from the Figure legend nor after reading the Material and Methods, how this experiment was done. Specifically, what is plotted on X, is it the amount of inhibitor or the amount of protein? These things have to be checked through the manuscript.

      It would be interesting to confirm if the inhibition of infection is based on the inhibition of viral binding to the cells. This should not be complicated to realize, and it could provide evidence for the mechanism of action.

      Extensive use of terms like "this domain" is not good in this type of article, like in lines 207, and 211. It is not always clear to what domain are authors referring to, so it may be much better to mention the domain in question by the exact name.

      Line 337, If I am not mistaken dilutions are serial not series.

      Line 613, in methods. Please use g force instead of rpm, it is more informative. Even if it is just to pellet cells.

      Thank you very much for your instructive comments. We have addressed most of them in the revised manuscript. For instance, the immobilization of the GB1 domain and the cohesin module is now mentioned in Figure S9. Additionally, in the previous Figure 5B, the "x" represents the concentration of the inhibitor. Serial and g force is updated.

      Reviewer #3 (Recommendations For The Authors):

      Line 190

      Did you mutate all the amino acids at the same time? What was the impact of all these mutations on the structure of the helical region? Or if you modeled the protein again after replacing these 7 amino acids, did you find that there was no difference? Regardless of your answer, you must include a superposition of the mutated structure and the wt.

      Thank you for the insightful comment. We have now also predicted the structure of the serine mutant using AlphaFold2 (AF2). As expected, the helical domain structure remains largely preserved with only minor differences. We have included these results in Figure S6, as suggested.

      Figure 2D

      In this graph, the authors should indicate the ΔG as a negative value. In fact, the graph does not match the text.

      Thanks for the reminder, it is corrected in the graph

      Figure 4B

      Is the difference in binding force significantly different? 28.8 vs 33.7 pN

      The absolute difference in binding force is not large (~5 pN). However, for a system with a relatively low binding force, this difference is significant. Specifically, the 5 pN difference accounts for approximately a 14% reduction in binding force. We have included this percentage in the revised manuscript.

      Figure 5

      If AI-PoxBlocks723 was the only peptide effective in inhibiting viral infection of MPOX and other related viruses but not with 100% effectiveness, do you think this could be a consequence of a low interaction efficiency or the existence of a different receptor? Or a secondary region of binding in the H3? Can you argue about this?

      It has been proposed that there are other adhesion proteins for MPXV, such as D8, in addition to H3. We believe this accounts for the observed less-than-100% effectiveness.

      The use of peptides as "inhibitory tools" could have an interesting effect in vitro, however, in vivo the immunological response against the peptide will reduce/eliminate it, how you may optimize the "drug" development with this system, as you state in line 387.

      Thank you for your thoughtful comment. You are correct that the use of peptides as inhibitory tools could induce an immune response in vivo, which might limit their effectiveness over time. To optimize this approach for drug development, conjugate the peptides with carrier molecules, such as liposomes, nanoparticles, or dendrimers, which can protect the peptides from immune detection and improve their delivery to target cells. This could allow for more controlled and sustained release of the peptide in vivo, reducing the chances of immune clearance. We have added this discussion in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA- filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. 

      We thank the reviewer for his/her evaluation of our work. At present, several studies reported that a variety of combinations of two transmitters are co-released from different synaptic vesicles in the central nervous system. In this regard, we think the cotransmission of glutamate/GABA from different synaptic vesicles is not surprising. To better explain to the reader how much we know about co-release of dual transmitters in the brain, we have now added new sentences describing segregated co-release of two neurotransmitters in other synapses in the Introduction (line 63-80).

      Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum. 

      Although these questions are very important and interesting to address, little is known about molecular mechanisms how VGluT2 and VIAAT are sorted to different vesicles and each synaptic vesicle is segregated. That is why we had not mentioned the sorting mechanisms in the original manuscript. Nevertheless, in response to the reviewer’s suggestion, we have now added new sentences describing possible mechanisms for the sorting and segregation of VGluT2 and VIAAT in the Discussion (line 439-462).

      As for the question regarding why glutamate and GABA are released from different synaptic vesicles, we mentioned the functional roles of separate release of two transmitters over release from single vesicles several times in the Introduction (line 94100), Results (line 300-302), and Discussion (line 406-408, 521-522). Although it seems to be an interesting point to think about transmitter concentrations in the vesicles, we think this issue is beyond the scope of the present study. Given that manipulation of vesicular transmitter contents is technically possible (Hori and Takamori, 2021), this issue awaits further investigation.

      Major concerns: 

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a onetransmitter identity and had distinct physiological properties, could that account for some of the physiological findings? 

      Thank you for raising this important point. To examine whether repetitive light illumination induces ChR2 desensitization, the fiber volley was extracellularly recorded. We found that paired-pulse or 10 stimuli at 5, 10, and 20 Hz reliably evoked similar amplitudes of fiber volley during light stimulation. These results clearly indicate that repetitive light stimulation can reliably activate ChR2 and elicit action potentials in the SuM axons. These new findings are now included in Figure 1-figure supplement 2 and Figure 5-figure supplement 2. We also previously demonstrated that by direct patch-clamp recordings from ChR2-expressing hippocampal mossy fiber terminals, 125 times light stimulation at 25 Hz reliably elicited action potentials (Fig. S1: Fukaya et al., 2023). Therefore, we believe that if expression level of ChR2 is high, activation of ChR2 induces action potentials in response to repetitive light stimulation and mediates synaptic transmission with high efficiency.

      We found that most of the SuM terminals (95%) have both VGluT2 and VIAAT (Figure 1E). This anatomical evidence strongly indicates that most of the SuM terminals have the ability to release both glutamate and GABA, and the SuM fibers having one transmitter identity should be minor populations.

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation. 

      In this experiment, the light intensity was adjusted to yield less than 80% of the maximum response as described in the method section of original and revised manuscript, minimizing the possibility of receptor saturation. We also excluded the possibility that PPR differences could be attributed to differential receptor saturation and desensitization by using a low-affinity AMPA receptor antagonist and a low-affinity GABAA receptor antagonist (Figure 5-figure supplement 3). These results indicate that PPR differences are mediated by the presynaptic origin.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper. 

      Based on our minimal stimulation results and anatomical analysis, we believe that many SuM terminals contain both glutamate and GABA. Therefore, both transmissions are able to be modulated by mGluRs and GABAB receptors within the same terminals. As the reviewer pointed out, differential responsiveness of glutamate-containing and GABA-containing vesicles to the GPCR signal could be one of the molecular mechanisms for differential effects of GPCRs on EPSCs and IPSCs. In addition, the spatial coupling between GPCRs and active zones for glutamate and GABA in the same SuM terminals may be different, which may give rise to differential modulation of glutamate and GABA release. These possible mechanisms are now described in the Discussion (line 469-476).

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents. 

      We agree with the reviewer that some events are tiny and some small currents could be masked by background noise. We understand that detecting the biphasic events by minimal stimulation has technical limitations. Because we automatically detected biphasic events, which were defined as an EPSC-IPSC sequence, only if an outward peak current following an inward current appeared within 20 ms of light illumination as described in the method section, we cannot exclude the possibility that the biphasic events we detected might include false biphasic responses. To compensate these technical issues, we also performed strontium-induced asynchronous release as another approach and found similar results as minimal stimulation experiments (Figures 3E and 3F). Furthermore, we confirmed that the amplitudes and kinetics of minimal light stimulation-evoked EPSCs or IPSCs were not altered by blockade of their counterpart currents (Figure 3-figure supplement 2). Even if false biphasic responses were accidentally included in the analysis, eventually biphasic events are a minor population and we successfully detected discernible independent EPSCs and IPSCs, which were the major population of uniquantal release-mediated synaptic responses. Thus, multiple pieces of evidence support distinct release of glutamate and GABA from SuM terminals.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG. 

      We thank the reviewer for raising an important point. As shown in Figure 4A, B, almost all VGluT2-positive terminals in the GC layer co-expressed with VIAAT. We are aware that VTA neurons reportedly project to the GC layer of the DG and co-release glutamate and GABA (Ntamati and Luscher, 2016). Contrary to this report, our retrograde tracing analysis did not reveal direct projections from the VTA to the DG. This new data is now included in Figure 4-figure supplement 1. We also added pre-embedding immunogold EM analysis, in which SuM terminals were virally labeled with eYFP, confirming that they form both asymmetric and symmetric synapses (revised Figure 4F). Together with these new data, our results clearly demonstrate that SuM terminals in the GC layer form both asymmetric and symmetric synapses. While our results strongly suggest that VGluT2positive terminals and SuM terminals in the GC layer are nearly identical, we cannot fully exclude the possibility that other inputs originating from unidentified brain regions may co-express VGluT2 and VIAAT in the GC layer. Therefore, in Figure 4 of the revised manuscript, we described “VGluT2-positive terminals” instead of “SuM terminals”.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why? 

      In triple immunofluorescence for VGluT2, VIAAT, and GluN1, free GluN1 puncta were predominantly observed in the molecular layer. Given that VGluT2-positive terminals are sparse in the molecular layer, these GluN1 puncta are primarily associated with VGluT1, the dominant subtype. In this study, we focused the analysis of GluN1 puncta specifically on the GC layer, excluding the molecular layer. To avoid miscommunication, we changed the original Figure 4E to the new Figure 4G, which focuses on the GC layer and aligns with the quantitative analysis. Additionally, we used ultrathin sections (100-nm-thick) to enhance spatial resolution, which limits the detection of co-localization events within this confined spatial range, as noted in the Discussion (line 485-488).

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM? 

      To further support our immunofluorescence results, we performed EM study and found that a single SuM terminal formed both asymmetric and symmetric synapses on a GC soma (revised Figures 4E and 4F). These new data and our immunofluorescence results clearly indicate that a single SuM terminal forms both glutamatergic and GABAergic synapses on a GC and co-release glutamate and GABA. 

      As the reviewer pointed out, our immuno EM shows that VGluT2 and VIAAT labeled vesicles appear to intermix in asymmetric and symmetric synapses. Accordingly, in the revised manuscript, Figure 7 has been modified to show the intermixing of glutamate and GABA-containing vesicles in the SuM terminal. It should be noted that because of low labeling efficiency, our immuno-EM images don’t represent the whole picture of synaptic vesicles for glutamate and GABA. There could be biased distribution of vesicles close to their release site (more VGluT2-containing vesicles close to asymmetric synapses and more VIAAT-containing vesicles close to symmetric synapses) as reported previously (Root et al., 2018). Additionally, our results could be explained by other mechanisms: co-release of glutamate and GABA from the same vesicles, with one transmitter undetected due to the absence of its postsynaptic receptor. This possibility is now mentioned in the Discussion (line 512-520). More detailed vesicle configuration in a single SuM terminal will have to be investigated in future studies.

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it. 

      We believe these results are necessary for the following two reasons. First, we showed that glutamate/GABA co-transmission balance is dynamically changed in a frequency-dependent manner (Figure 5). In terms of physiological significance, it is important to demonstrate how these frequency-dependent dynamic changes affect GC firing. Therefore, we believe that figure 6, which shows how SuM inputs modulate GC firing by repetitive SuM stimulation, is necessary for this paper. Second, we previously reported the excitatory effects of the SuM inputs on GC firing, suggesting the important roles of glutamatergic transmission of the SuM inputs in synaptic plasticity (Hashimotodani et al., 2018; Hirai et al., 2022; Tabuchi et al., 2022). In contrast, how GABAergic cotransmission contributes to SuM-GC synaptic plasticity and DG information processing was not well understood. Our results in figure 6, which demonstrate the inhibitory effects of GABAergic co-transmission on GC firing by high frequency repetitive SuM input activity, clearly show the contribution of GABAergic co-transmission to short-term plasticity at SuM-GC synapses. For these reasons, we would like to keep Figure 6. We hope that our explanations convince the reviewer. 

      Reviewer #2:

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequencydependent depression and GABA showing frequency-independent stable depression. 

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs. 

      Strengths:

      The conclusions of this paper are mostly well supported by the data. 

      We thank the reviewer for their positive and constructive comments on our manuscript.

      Weaknesses: 

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4. 

      We have now answered these questions in recommendations section below.

      Reviewer #3:

      Summary: 

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA cotransmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs. 

      Strengths: 

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits. 

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below. 

      We appreciate the reviewer’s positive view of our study. We responded in more detail in recommendations section below.

      Recommendations for the authors:

      Reviewer #1:

      (1) The blue color for VIAAT in panel 1C is extremely hard to see. 

      Thank you for pointing out. We have changed to the cyan color for VIAAT in Figure 1C and D in the revised manuscript.

      (2) Line 329 "perforant" not "perfomant".  

      We appreciate the reviewer’s careful attention. In the revised manuscript, we corrected this misword.

      Reviewer #2:

      To convincingly demonstrate that the authors stimulated SuM axon fiber instead of SuM terminals (Supplementary Figures 1A), they should provide an image showing the distribution of SuMlabeled fibers and axon terminals reaching the dentate gyrus (DG) and the trace of the optic fiber, rather than providing a diagram of the experimental setup. 

      We appreciate the reviewer’s suggestion. We have now provided a new experimental setup image (Figure 1-figure supplement 1A) showing a single GC, the distribution of SuM fibers in the GC layer, and the illumination area at each location. As SuM inputs make synapses onto the GC soma and dendrite close to the GC cell body, SuM-GC synapses in the recording GCs exist in a very limited area. This characteristic synaptic localization allowed us to control the illumination area without applying light to the SuM terminals in the recording GCs. Delayed onsets of EPSCs/IPSCs by over-axon stimulation (Figure 1-figure supplement 1C, D) also support that SuM terminals in the recording GCs were out of illumination area.

      Additionally, the authors should clarify the discrepancy between the antibody mentioned in the list of primary antibodies, which recognizes the gamma2 subunit of the GABAA receptor, and the alpha1 subunit of the GABAA receptor mentioned in the results and Figure 4. 

      We apologize for this mistake. As described in the main text and figure, we used the antibody for a1 subunit of the GABAA receptor. Table S1 has been corrected in the revised version of the paper.

      Reviewer #3:

      (1) In Figure 1, the authors used two [Ca2+]o concentrations to study the EPSC and IPSC amplitudes. How does the Ca2+ concentration affect the PPR in the EPSC and IPSC, respectively? 

      Given that lowering the extracellular Ca2+ concentration reduces the release probability, it is expected that 1 mM extracellular Ca2+ concentration increases PPR compared to 2.5 mM. Actually, we observed that lowering the extracellular Ca2+ concentration increased the synaptic responses from 2nd to 10th (both EPSC and IPSC) by train stimulation (Figure 5).

      (2) In Figure 2D, does baclofen also have a dose-dependent effect on the inhibition of the EPSC and IPSC similar to the DCG-IV in Figure 2C? 

      Thank you for your question. Because we aimed to demonstrate the differential inhibitory effects of baclofen at a certain concentration on glutamatergic and GABAergic co-transmission, we did not go into detail regarding a dose-dependent effect. In response to the reviewer’s comment, we performed the effects of higher concentration of baclofen on EPSCs and IPSCs. As shown in the figure below, 50 µM baclofen inhibited EPSCs and IPSCs to the similar extent. Therefore, by comparing inhibitory effect of two different concentrations of baclofen (5 and 50 µM), we believe that baclofen also has a dose-dependent inhibitory effect on both EPSCs and IPSCs similar to the DCGIV.

      Author response image 1.

      (3) In Figure 2E, statistical labels, such as "*" or "n.s." (not significant), should be provided on the plots to facilitate the reading of figures. 

      In response to the reviewer’s comment, we have provided statistical labels in the Figure 2E.

      (4) In Figure 3A, the latency of the evoked EPSC for the lower light stimulation groups seems to be much slower than the one shown on the left or other figures in the paper, such as Figure 1F.

      Please double-check if the blue light stimulation label is placed in the right location. 

      Corrected, thanks.

      (5) The use of minimal light stimulation in optogenetic experiments is not appropriately justified or described. More detailed information should be provided, such as whether the optogenetic stimulation is performed on the axon or the terminals of the SuM. 

      We appreciate the reviewer’s suggestion. To effectively detect stochastic synaptic responses, the light stimulation was applied on the terminals of the SuM. We have now stated this information (line 212). We also further described the justification of use of minimal light stimulation in the revised manuscript (line 207-209). 

      References

      Fukaya R, Hirai H, Sakamoto H, Hashimotodani Y, Hirose K, Sakaba T (2023) Increased vesicle fusion competence underlies long-term potentiation at hippocampal mossy fiber synapses. Sci Adv 9:eadd3616.

      Hashimotodani Y, Karube F, Yanagawa Y, Fujiyama F, Kano M (2018) Supramammillary Nucleus Afferents to the Dentate Gyrus Co-release Glutamate and GABA and Potentiate Granule Cell Output. Cell Rep 25:2704-2715 e2704.

      Hirai H, Sakaba T, Hashimotodani Y (2022) Subcortical glutamatergic inputs exhibit a Hebbian form of long-term potentiation in the dentate gyrus. Cell Rep 41:111871.

      Hori T, Takamori S (2021) Physiological Perspectives on Molecular Mechanisms and Regulation of Vesicular Glutamate Transport: Lessons From Calyx of Held Synapses. Front Cell Neurosci 15:811892.

      Ntamati NR, Luscher C (2016) VTA Projection Neurons Releasing GABA and Glutamate in the Dentate Gyrus. eNeuro 3.

      Root DH, Zhang S, Barker DJ, Miranda-Barrientos J, Liu B, Wang HL, Morales M (2018) Selective Brain Distribution and Distinctive Synaptic Architecture of Dual Glutamatergic-GABAergic Neurons. Cell Rep 23:3465-3479.

      Tabuchi E, Sakaba T, Hashimotodani Y (2022) Excitatory selective LTP of supra-mammillary glutamatergic/GABAergic co-transmission potentiates dentate granule cell firing. Proc Natl Acad Sci U S A 119:e2119636119.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we have included data on the impact of the antioxidant GSH on antibiotic-induced ATP levels as the supplementary figure (S9C)

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels or a total pool size of purine nucleotides, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, and therefore, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Reviewer #1 (Recommendations for the authors):

      (1) Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      (a) CCCP, nigericin, BDQ, and the atpD mutant all appear to affect M. smegmatis growth (Figures S6C, S7C, S7D-E, and Figure 1B from reference 41). Could depressed growth contribute to the rescue effects of these compounds?

      We concur with the reviewer that the reagents we used (CCCP, Nigericin, and BDQ) to suppress the ATP burst in the presence of antibiotics do affect bacterial growth. This growth sub-inhibitory effect is expected given their roles in either uncoupling the electron transport chain from oxidative phosphorylation or directly inhibiting ATP synthase, leading to reduced ATP production compared to the untreated control. However, we chose concentrations that reduces the antibiotic-induced surge in ATP levels without significantly depriving the bacteria of the ATP  essential for their survival, thereby avoiding cell death.

      Consequently, all three reagents (as shown in Figures S6C, S7C, and S7D-E) were employed at non-lethal concentrations. We would like to emphasize, however, that it was not feasible to select a reagent concentration that had no impact on growth yet still suppressed the antibiotic-induced ATP burst. We recognize the possibility that growth retardation may have contributed to the observed rescue effects. To address this concern, we used multiple orthogonal methods (CCCP, Nigericin, and BDQ), each with distinct mechanisms having a common effect of reducing the ATP surge, to minimize off-target effects and support our findings.

      Also, the authors report no growth phenotype for atpD mutant (Figure S8) but only carry out the growth curve to an OD of 2, which is approximately where the growth curve from ref 41 begins to diverge.

      Additionally, to further confirm that bacterial rescue was not due to growth retardation caused by these reagents, we utilized the atpD mutant. All experiments, including those involving the atpD mutant, were conducted when the OD600nm reached 0.8 (during the exponential phase). We specifically ensured that the growth of the atpD mutant was not compromised during this phase (Figure S8) and restricted our growth curve to the early stationary phase (OD600 between 1.5 and 2). While it is possible that the atpD mutant may exhibit slower growth compared to wild-type bacteria in stationary phase at an OD600nm of 4 (as shown in ref 41), however, this does not impact our observations.

      (b) Reference 41 also reports that the atpD mutant is more sensitive to some antibiotics  (Figure 6). This includes isoniazid, which references 34 and 35 have both reported caused an ATP burst.

      We acknowledge the reviewer’s query regarding the phenotype of the atpD mutant against isoniazid (Reference 41). However, the cited reference does not provide clarity on why the M. smegmatis atpD mutant exhibits increased sensitivity to isoniazid and other antibiotics, nor does it explain whether this sensitivity is due to reduced ATP levels or altered cell wall properties, such as enhanced drug uptake, as observed with Nile red and ethidium bromide.

      While references 34 and 35 reported an ATP burst following isoniazid treatment in slow-growing M. bovis BCG and M. tuberculosis, it remains to be tested whether isoniazid acts similarly in the fast-growing M. smegmatis, where it is bacteriostatic rather than being bactericidal as observed in M. bovis BCG and M. tuberculosis.  

      (2) The statistics require some attention. First, the wording for almost all of the figures is something like "data points represent the mean of at least three independent replicates," is that correct? CFUs are notoriously messy so it is surprising (impressive?) that the variability between replicates is so low. Second, t-tests are not appropriate for multiple comparisons.

      We thank the reviewer for raising this important query. It is correct that all our experiments included at least three independent replicates, and many of our results exhibit a high degree of variability, as indicated by the large error bars. We would like to clarify that we did not perform multiple comparisons on our results. For all analyses, an unpaired t-test was conducted between the control group and one experimental group at a time. Consequently, statistical data were generated for each pair of results, and the comparisons were displayed on the graph relative to the control data points, as mentioned in the Methods section under the heading “Statistical analysis”

      (3) Figures 9 and 10A-B seem tangential to the main point of the paper and, in the case of Figure 10A-B, preliminary.

      In this study, our aim was to comprehensively investigate the nature of antibiotic-induced stresses (i.e., mechanisms of action from T = 15 hrs) and leverage these insights to enhance our understanding of bacterial adaptation mechanisms, particularly antibiotic tolerance (from T = 25 hrs). While a significant portion of the manuscript focuses on the secondary consequences of antibiotic exposure, we also sought to assess the bacteria's ability to counteract these stresses, contributing to our understanding of how antibiotic tolerance phenotypes develop.

      The results presented in Figure 9 clearly demonstrate that bacteria attempt to reduce respiration by decreasing flux through the complete TCA cycle, thereby mitigating ROS and ATP production in response to antibiotics. These findings not only uncovers potential metabolic pathways to downregulate respiration but also validate our observations regarding the role of increased respiration, ROS generation, and subsequent ATP production in antibiotic action.

      Importantly, bacterial responses to antibiotics were not limited to metabolic adaptations. They also included the upregulation of the intrinsic drug resistance determinant Eis (Figure 10A) and an increase in mutation frequency (Figure 10B), both of which indicate a greater likelihood of these bacteria developing antibiotic tolerance and resistance. Therefore, the data presented in Figures 9 and 10A-B are not peripheral to the central theme of the paper. Rather, they complement and strengthen it by providing a comprehensive understanding of the consequences of antibiotic exposure, which aligns with the primary objectives of our study.

      Do the various perturbations used here (especially streptomycin) effect expression and/or turnover of the genetically-encoded sensors Mrx1-roGFP2 or Peredox-mCherry?

      We appreciate the reviewer for raising this query. Since streptomycin treatment leads to mistranslation and eventually inhibits protein synthesis, it is possible that such treatment could impact the expression and/or turnover of the genetically encoded biosensors, Mrx1-roGFP2 (1) or Peredox-mCherry (2). However, we do not anticipate any effects on the readout as both biosensors provide ratiometric measurements of redox potential and NADH levels, respectively, which eliminates errors due to variations in protein abundance. Nevertheless, in our experiments with both drugs, we employed multiple time- and dose-dependent responses, ensuring that all meaningful conclusions were drawn from the overall trends seen in the data rather than an individual data point.

      (1) Bhaskar A, Chawla M, Mehta M, Parikh P, Chandra P, Bhave D, et al. (2014) Reengineering Redox Sensitive GFP to Measure Mycothiol Redox Potential of Mycobacterium tuberculosis during Infection. PLoS Pathog 10(1): e1003902. https://doi.org/10.1371/journal.ppat.1003902

      (2) Shabir A. Bhat, Iram K. Iqbal, and Ashwani Kumar*. Imaging the NADH:NAD+ Homeostasis for Understanding the Metabolic Response of Mycobacterium to Physiologically Relevant Stresses. Front Cell Infect Microbiol. 2016; 6: 145. doi: 10.3389/fcimb.2016.00145

      (4) Do the antibiotics affect permeability? Especially relevant to CellROX experiments.

      Antibiotics can impact, or even increase, bacterial membrane permeability, a phenomenon noticed in case of self-promoted uptake of aminoglycosides. When aminoglycosides bind to ribosomes, they induce mistranslation, including of membrane proteins, leading to the formation of membrane pores, which in turn enhances antibiotic uptake and lethality (1-2). However, whether the antibiotics used in our study (norfloxacin and streptomycin) at the concentrations applied altered membrane permeability is not known.

      Experiments involving the CellROX dye are unlikely to be influenced by changes in membrane permeability, as the dye is freely permeable to the mycomembrane.

      References:

      (1) Davis BD Chen LL Tai PC (1986) Misread protein creates membrane channels: an essential step in the bactericidal action of aminoglycosides PNAS 83:6164–6168.

      (2) Ezraty B Vergnes A Banzhaf M Duverger Y Huguenot A Brochado AR Su SY Espinosa L Loiseau L Py B Typas A Barras F (2013) Fe-S cluster biosynthesis controls uptake of aminoglycosides in a ROS-less death pathway Science 340:1583–1587.

      (5) Figures 4E-H does GSH affect bacterial growth/viability on its own i.e. in the absence of a drug?

      We thank the reviewer for raising this query. Indeed, the 10 mM GSH used in our experiments to mitigate and rescue cells from antibiotic-induced ROS does impact bacterial growth on its own, though it does not affect viability, likely due to GSH inducing reductive stress on bacterial physiology. For clarification, we have included the viability measurement data in the presence of 10 mM GSH alone in the revised version of the manuscript, as supplementary figure (S4E).

      (6) p. 2 "...antibiotic resistance involves more complex mechanisms and manifests as genotypic resistance, antibiotic tolerance, and persistence." This reads as tolerance and persistence being a subset of resistance, which is not quite accurate. There is at least one other example of similar wording in the text.

      We thank the reviewer for highlighting this point. Our intention was to convey that resistance to antibiotics can manifest in two forms: permanent or genetic resistance, and transient resilience through antibiotic tolerance and persistence.

      (7) p. 3 "...and showing no visible differences in the growth rate...". It is hard to say this as all the values appear to be 0 - possible to zoom in on the CFU counts in this region? Same comment for p. 5 "...the unaffected growth rate in the early response phase...".

      We apologize for the lack of clarity regarding the resolution of the early time points in the growth curve. Unfortunately, it was not feasible for us to zoom in on the initial time points due to the significant difference in cell viability between T=0 and T=25 hours (i.e., spanning 8 generations). For clarification in the growth phenotype at early time points, please refer to Author response image 1, where CFU counts are plotted on a logarithmic scale. The y-axis spans 6-8 orders of magnitude across different conditions, making it difficult to visualize early time points on a linear scale.

      Author response image 1.

      (8) p. 5 "...data for each condition were subjected to rigorous quality control analysis (S2B)." I believe that this is the case, but how Figure S2B demonstrates this fact is not clear.

      Figures S2A and S2B present the quality assessment data for all six proteomics datasets. Figure S2A illustrates the consistency in the number of proteins identified across 10 samples (5 independent replicates for both control and drug treatment). The minimal variation in the number of identified proteins indicates reproducibility across the different runs. Similarly, Figure S2B displays the variability in Pearson correlation coefficient values of protein abundance (LFQ intensities) across the 10 samples. The closer and more consistent the Pearson correlation values, the greater the reproducibility of the quantitative data acquisition.

      (9) p. 7 "To look for a shared mechanism of antibiotic action..." The wording implies an assumption. Perhaps "to test whether" would be more appropriate? Same comment for p. 12 "To further confirm whether enhanced respiration ...".

      We appreciate the reviewer’s suggestions for both sentences and have made the necessary changes in the revised version. Thank you for bringing this to our attention.

      (10) Figure S1A-B figure legend. How was this assay performed?

      The experiment for Figures S1A-B was conducted using a standard REMA assay, as described in the methods section. Cells were harvested at the 25th-hour time point, and drug MICs were compared between cells grown with and without 1/4x MBC99 of the drugs. This was done to determine whether the growth recovery observed during the recovery phase was due to the presence of drug-resistant bacteria.

      (11) p. 14 "...(CCCP), a protonophore, at non-toxic levels..." Figure S6C implies an effect on growth.

      As clarified earlier in response to query 1(a), the CCCP reagent was used at concentrations that effectively minimize the antibiotic-induced surge in ATP levels. However, at these concentrations, CCCP reduces cellular ATP production (Figure S6A), leading to bacterial growth delay (Figure S6C). By "non-toxic levels," we intended to convey that these concentrations of CCCP are non-lethal to the bacteria, as evidenced in Figure S6C.

      (12) Figure 8A y axis is this CFU/mL or OD/mL?

      The y-axis for the figure 8A depicts CFU/ml as it measures the cell survival in response to increasing concentrations of bipyridyl.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified a new mechanism of antibiotic action in Mycobacterium smegmatis. However, as discussed extensively in the manuscript's discussion section, whether and to what extent this mechanism applies to other organisms still needs to be tested.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

      Reviewer #2 (Recommendations for the authors):

      Figure 1 needs to have an x-axis that has intervals that have 10E5 CFU to 4 x 10E8. But even 4 x 10E8 CFU/ml is a late log and not exponentially growing cells.

      Figure 1 illustrates the growth curve. We hope the reviewer meant the Y axis which represents CFU/ml on a linear scale. As mentioned in response to reviewer #1’s query no. 7, it was not feasible to include the viability (CFU/ml) values at T=0 and a few subsequent time points. Naturally, the starting cell count was not zero; we began with approximately 600,000 CFU/ml, corresponding to an OD600nm of 0.0025/ml. For clarification, we have mentioned the initial OD as well CFU/ml at T= 0 hr in the figure legend.  

      Carefully look at Figure 1, what were you trying to show? Your x-axis goes from 0 to 10E8, of course you did not inoculate 0 cells, but if you had measured CFUs, you might not have gotten the great variability you reported in your graph.

      We assume that the reviewer is suggesting that "if we had measured OD600nm/ml instead of CFU/ml, we might not have observed the high variability we reported." While we agree with the reviewer's comment, our decision to use CFU/ml for growth measurement was to obtain more resolved and detectable data points, as an OD600nm of 0.0025/ml cannot be reliably measured with a spectrophotometer. Additionally, at around T=15 hours, where we observed an extended lag phase (referred to as the stress phase), the OD600nm was approximately 0.05, which is barely detectable. Therefore, the significant differences between the control group and the ¼ x MBC99 drug-treated group might not have been observed if we had relied on OD-based measurements. Despite the presence of high error bars and variability in the data points, we were still able to demonstrate clear differences in bacterial growth between treated and untreated samples at sub-lethal drug doses. This ultimately allowed us to capture the nature of antibiotic-induced stresses.

      There is no doubt that sublethal concentrations of antibiotics will have an effect on the bacterial cells. But it is not clear how you are concluding that ATP burst is the dominant driver of lethality. M. smegmatis can be very different from Mtb.

      Using a series of time- and dose-dependent experiments with plasmid and kit-based approaches, we demonstrated that both antibiotics generate and rely on ROS and ATP bursts to induce lethality in M. smegmatis. Careful monitoring of oxidative stress in cells, following specific quenching of the antibiotic-induced ATP burst (Figure 7, S9A-B), revealed that the ATP burst is the dominant driver of antibiotic lethality. In all tested experiments, surviving bacteria exhibited elevated levels of oxidative stress but were able to maintain their viability, suggesting that oxidative stress alone is not the dominant factor in antibiotic-induced lethality. Furthermore, quenching of ROS by glutathione also suppressed antibiotic-induced surge in ATP levels, thus supporting the notion that ROS alone, is not the dominant driver of antibiotic action as previously understood.

      All experiments reported were conducted using fast-growing M. smegmatis, and have acknowledged the need for similar experiments in other bacterial systems, including M. tuberculosis, to assess whether our findings are applicable to other systems.

      Another point, the use of a mutant in the ATP synthase is an interesting idea, but would it be better to use something where you knock out the ATP synthase activity with siRNA or a temperature-sensitive allele?

      We appreciate the reviewer’s encouraging comment. Knocking out ATP synthase would completely halt oxidative phosphorylation and shut down aerobic respiration, leading to severe metabolic and growth defects. Such stressful and non-growing conditions are not suitable for testing the efficacy of antibiotics, as it is widely accepted that antibiotics are more effective against metabolically active bacteria.

      Lastly, the conclusion is that norfloxacin and streptomycin have common mechanisms of action, but the authors do not explain how a DNA gyrase inhibitor shows the same mechanisms of action as a ribosome inhibitor.

      The connection between antibiotic target corruption (DNA gyrase or ribosome) and the activation of respiration is indeed unclear, intriguing, and represents one of the most exciting questions in the field of antibiotic mechanisms of action. In the discussion section, we have speculated on potential pathways for this connection, including the possibility that the inhibition of cell division by both drugs may create a perception of resource scarcity (energy and biosynthetic precursors), which could subsequently trigger increased metabolism, respiration, ROS production, and ATP synthesis. However, the precise mechanisms underlying this connection require further investigation and are beyond the scope of the present study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Author Response

      Reviewer #1 (Public Review): 

      Weaknesses: 

      - Having demonstrated that NK cell IFNgamma is important for recruiting and activating DCs and T cells in their model, one is left to wonder whether it is important for the therapeutic effect, which was not tested. 

      We conducted a preliminary study to compare the pro-survival effect of WT NK and Ifng-/- NK cell therapies. We found that, in the 95-500 mg day-21 tumor group, the overall survival (OS) of mice receiving Ifng-/- NK cell therapy significantly decreased (p = 0.045) compared to mice receiving WT NK cell therapy up to 60 days after tumor inoculation, but there was no difference in OS beyond 65 days after tumor inoculation. Therefore, we have added the following sentences at the end of the second paragraph in our Discussion (Page 32):

      “However, although Ifng-/- NK cells induced less cDC activation compared to WT NK cells, the levels of CD86 on cDCs of mice that received Ifng-/- NK cells were higher than those of mice not subjected to NK cell transfer (Figure 4B). This outcome indicates the presence of IFN-g-independent or/and compensatory mechanism(s) for cDC activation by the transferred NK cells, which is in line with our preliminary result that Ifng-/- NK cell therapy does not significantly diminish the pro-survival effect in comparison to WT NK cell therapy beyond 60 days after tumor cell inoculation (data not shown).”

      - It was somewhat difficult to gauge the clinical trial results because the trial was early stage and therefore not controlled. Evaluation of the results therefore relies on historical comparisons. To evaluate how encouraging the results are, it would be valuable for the authors to provide some context on the prognoses and likely disease progression of these patients at the time of treatment. 

      We had already indicated in our Results that all six patients had an ECOG performance status of 0 (Page 25 and Table). We have now added in the Results that they had “a predicted survival of >3 months” (Page 25).

      Reviewer #1 (Recommendations For The Authors):

      Minor points: 

      (1) It would be helpful if the authors provided a rationale for why they derived their NK cell product from bone marrow cells instead of the more common source, spleen cells. 

      We now clarify that: “We used BM cells instead of splenocytes for NK cell culture because removal of T cells from BM cells before culturing is not necessary” (Page 35) to the section Ex vivo expansion of murine and human NK cells in our Materials and Methods.

      (2) It would have been helpful to provide summary results from replicates of the cytokine production data shown in Figure 1F. 

      We have now added a graphical panel on the relative ΔMFI of two independent experiments to Figure 1F and revised the figure legend accordingly (Page 7—8).

      (3) The role of conventional CD4+ T cells is a little unclear. The authors state in the discussion that they contribute to the antitumor response, which is consistent with their finding that depleting both CD4 T cells and CD8 T cells has a greater effect than depleting CD8 T cells. Depleting CD4 T cells alone trended towards improving the response, however. Probably Tregs are the culprit in the latter effect but a sentence or two would be helpful if the claim for a protective role for CD4 T cells is to remain.  

      We have now re-analyzed the data of Figure 3D by separating mice into two groups according to day 21 tumor weight, i.e., 95-600 mg and >600 mg (Page 13—14). We have revised our explanation of the Figure 3D data in the Results (Page 11—12) as follows:

      “Accordingly, we examined the role of T cells in NK cell therapy by depleting T cell subsets with antiCD4 or/and anti-CD8 antibodies two days before primary tumor resection (Figure 3D Schema and Figure 3-figure supplement 1). In the 95-600 mg tumor group, depletion of CD8+ cells alone or both CD4+ and CD8+ cells diminished the effect of NK cell therapy, whereas depletion of CD4+ cells alone did not affect OS (Figure 3D). This result indicates that CD8+ T cells are essential for the effect of NK cell therapy. In contrast, the >600 mg tumor group displayed a limited NK-cell treatment effect as expected, but did exhibit improved OS upon depleting CD4+ cells alone (Figure 3D). As the proportion of lung Foxp3+CD4+ T cells in CD45+ cells positively correlated with day 21 tumor weight (data not shown), depletion of Foxp3+CD4+ T cells by anti-CD4 antibody likely has a stronger effect in augmenting the immune response for the >600 mg tumor group than the 95-600 mg tumor group. Moreover, both tumor groups showed diminished OS upon depletion of both CD4+ and CD8+ cells than was the case for depletion of CD8+ cells alone, indicating a CD8+ T cell-independent anti-tumor effect of CD4+ T cells (Figure 3D).”

      (4) The schema in Figure 3E states that mice were inoculated with either EO771 tumor cells or B16F10 tumor cells, but it appears that the data only show EO771 tumor challenges. This should be corrected. 

      Corrected according to the reviewer’s comment.